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Preface 


Jean Dieudonn6 has advanced the idea* of the good mixture of 
mathematical spirit (essential to the well bemg of mathematics) in 
an individual. He differentiates between the tactician who faces a 
problem with old and well-tested tools and the strategist who imbeds 
the problem in a general theory with the result that the final so u- 
tion often appears as a triviality. Each of the two aspects will 
suffer without the other, and the two must blend harmoniously as 

was the case with Hilbert. ^ , , ._ 

With a little bit of tactics and a substantial amount of strategy, 
the lectures of the third volume of this series again deal with the 
broad and lively topics of modern mathematics. What has been 
covered in these three volumes is a modest fraction of the universe o 
mathematics, a subject that must continue its position in the front 
ranks of knowledge with much unity and substantial variety. 

In the preface to Volume I, I made the following observations, 
which have overall significance for the series, and should perhaps be 

16 In^ur time the growth of mathematical literature, like that of all 
the sciences, has burgeoned beyond the point where an individual 

* Jean Dieudonn<$, Recent developments in mathematics, Amer. Math. Monthly, 
71 : 3 (March 1964) 239-248. 
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finds its content accessible. It has been shown that the number of 
sscientists and the number of published scientific papers have multi¬ 
plied by ten for every doubling of the general population, and 
hence that the next decade or two will see the doubling of the 
volume of all the existing literature produced until now. The man 
who is not a specialist in a given area of research will find that the 
sheer volume of published work makes him a stranger, and the 
proverbial esotericism of mathematics has by now limited the possi¬ 
bility of universalism to the very few. 

For the future we are promised the applied marvels of data 
processing and information storage and retrieval. Nonetheless, it 
requires more than mechanical assimilation to unify and sift for 
significance the published knowledge of even a relatively small sub¬ 
field of mathematics. We are still obliged (and shall be for a long 
visible future) to look to the mature specialist himself for such 
evaluation, summarization, and interpretation as a partial answer 
to the “paper explosion.” 

This is the third and final volume of the series of lectures jointly 
sponsored by the George Washington University and the Office of 
Naval Research, which by intelligent summaries further extends 
the horizons of accessible modern mathematics. The lectures 
were begun in the faU of 1962 at the Lisner Auditorium of the 
George Washington University and continued at monthly intervals. 

Our intention, as stated in the preface to Volume I, was to invite 
each of the eminent men represented here to delineate a substantial 
research area, to describe it broadly and comprehensively for an 
audience of mathematicians who are not specialists in that area, and 
to contribute to this description his individual evaluation of the 
esthetic and practical aspects of the field, its position in mathe¬ 
matical development as a whole, and its future, as that might be 
implied in the conjectural exposition of its unsolved problems. 
The speakers have responded to a difficult challenge, compressing 
the enormous material at their command into a necessarily limited 
space, preserving the original spirit of the project in their informal 
approach, refraining from any deep and intricate excursions which 
might intimidate the tyro, and at the same time giving in each case 
the personal flavor of their own involvement in mathematical 
research. 

The outstanding success of the lecture series makes us confident 
that these books will be of interest to all mathematicians desiring to 
keep abreast of the major achievements in various fields, as well as 
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to those in the general scientific public wanting to have a flavor of 
the rapid and sophisticated development of the “Queen of the 
Sciences.” To the graduate student in mathematics embarking 
on his research career, it should be both useful and encouraging, 
providing glimpses of large areas to which he may not have been 
formally introduced and challenging him with unsolved problems. 

The last set of lectures was again organized and efficiently admin¬ 
istered by Professor David Nelson, who was aided by Professor 
Norman Wiegmann. To both colleagues go our thanks and appre¬ 


ciation. 


T. L. Saaty 
Editor 


Washington, D. C. 
February 1965 
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1 

Topics 
in Classical 
Analysis 

Einar HUle 


PROLOGUE 

If the purpose of these lectures is to describe the wave front of cur¬ 
rent research in classical analysis, my efforts will fall woefully short 
of the desired goal. In the time and space at my disposal I can only 
scratch the surface here and there with big gaps in between. Thus 
a choice of topics must be made and this is a highly subjective 
matter. It is said that one man's meat is another man's poison. 
A partial excuse for my choice of the poison is that some of the 
most succulent meat has already been consumed by other speakers 
in this series. Thus partial differential equations as well as com¬ 
plex function theory form substantial chapters of classical analysis 
and they have been treated expertly by other speakers. From what 
is left I have finally after much hesitation chosen the following 
topics: 

1. Functional inequalities. 

2. Functional equations. 

3. Mean values. 

4. Transfinite diameters* 

5. Potential theory. 

1 
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Obviously the first two topics are closely related and the 
last three also have much in common. Moreover, mean values 
as presented here center around a particular class of functional 
equations. 

Various types of functional inequalities arise in many connections. 
Convex and subadditive functions are important concepts based 
on such inequalities and so are subharmonic functions (not treated 
here). Much of the material presented is suggested by uniqueness 
theorems in the theory of ordinary differential equations. Classical 
complex function theory and renewal theory form other rich sources. 
There is no discussion of inequalities in the usual sense. For this 
topic we refer to the excellent tract by Beckenbach and Bellman [7] 
in the references at the end of this article. See also the monograph 
of Kazarinoff [86]. 

If functional inequalities are products of modern times, functional 
equations on the other hand have a long history, going back to 
Newton. In the present report we omit differential, difference, and 
integral equations and restrict ourselves to what might be called 
algebroid functional equations. Here, largely through the efforts 
of the Hungarian school, new life is sprouting and basic problems are 
being solved. Some of these are discussed here. 

Mean values and averaging processes have proved themselves 
powerful tools in analysis. Summability and ergodic theory are 
outstanding illustrations. Our modest aim here is to lay the 
foundation for the theory of transfinite diameters and to call atten¬ 
tion to some elementary consequences of the mean value property. 
Other applications will suggest themselves to the reader. 

Transfinite diameters in their role as capacities are basic for 
potential theory. The latter is at present a very lively topic and 
could justify a report of its own. For a general survey we refer to 
the outstanding monograph of M. Tsuji [78], which covers the 
literature up to 1958. Aside from necessary background material 
we shall concentrate on recent work. 

FUNCTIONAL INEQUALITIES 

1.0. Orientation . In this chapter we shall consider various types 
of inequalities between functions. The first four sections deal 
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with what we shall call, for the lack of a better name, restrictive 
inequalities . 

Let F be a given class of functions / on reals to reals defined, say, 
on the interval [0, a]. Let T be a mapping, not necessarily linear, 
of F into F and consider the inequality 

(1.0.1) f(t) £ T(f)(t), 0 

Such an inequality will hold for a certain subset Fo of F. If Fo = F 
we say that the inequality is trivial (with respect to F). If Fo is a 
proper subset of F, Fo 5 ^ 0, then (1.0.1) is restrictive . In particular, 
the inequality is determinative if it holds for one and only one element 
of F. Finally, it is absurd if F 0 - 0. 

The four cases are illustrated by 

m ^ k / 0 7(s) ds, m * -i - urn 2 , 

where F is the set of non-negative continuous functions on [0, a]. 
Section 1.5 deals with some function theoretical inequalities 
which are usually of a different nature but are restrictive in the 
sense that the given inequality implies much stronger inequalities. 
Section 1.6 deals with inequalities on product spaces, convex and 
subadditive functions being typical instances. 

1.1. Determinative Inequalities. Such inequalities underlie the clas¬ 
sical uniqueness theorems in the theory of differential and integral 
equations, and all the inequalities listed below come from this 
source. Thus the classical Lipschitz condition leads to the follow¬ 
ing determinative inequality: 

Proposition 1. If f is a bounded measurable function on (0, a) 
and if 

(1.1.1) 0 ^ fit) g K f* fis) ds, O^t^a, 

then f s 0. 

There are three different types of proof available for this proposi¬ 
tion in the literature, and we shall take the liberty of reminding the 
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reader of them. Using the method of iteration, one substitutes 
repeatedly the inequality into itself and after n steps one has 

m ~ n\ Rn+1 1 0 “ «)"/(»)*, 
and the conclusion is obvious. 

The second alternative is to proceed step by step. If 6 is fixed, 
0 < 0 < 1, and B is the supremum of / in (0, 6/K) then 

0^B^K-~B = 6B and B = 0. 

The same argument then applies to the interval (6/K, 2 6/K) and 
so on. 

Proof by integration is less elementary but lends itself better to 
generalization. Here we set 

F( S) = ft f(s) ds. 

Then F( 0) = 0 and for almost all t 

p'(t) = m ^ KF(t), 

so that 

j t [F(t)e~ Kt ] g 0 

and, since F(0) = 0, 

F(t)e~ Kt S 0. 

This gives F(t) = 0 for all t and hence also /(<) = 0. 

The same type of argument gives the following generalization. 

Proposition 2. With the same assumptions, let K(t) > 0 and 
K(t) £ L{ 0, a). Then 

(1.1.2) 0 ^ f(t) iS K(s)f(s) ds implies f = 0. 

The uniqueness theorem of M. Nagumo [61] leads to the following 
determinative inequality. 



Topics in Classical Analysis 5 


Proposition 3. If f G C( 0, a), if f{ 0) = 0, and if f'( 0) exists and 
is 0, then 

r* ds 

(1.1.3) 0^/(0^ / /(*)— implies f = 0. 

do s 

Since/(0) = /'(O) = 0, 

f(0- /*/(«) 7 
Jo s 

exists and F'(0) = 0. Hence 

F'(t) = rV(0 ^ r^(0, 

and 

| [r ] l F(0] ^ o. 

at 

Since t~~ l F(t) is decreasing and tends to zero with t } it follows that 
F(t) ^ 0 and this requires F(t) — 0 and hence also f(t) = 0. 

A more delicate argument is involved in the uniqueness theorem of 
W. F. Osgood [64]. 

Proposition 4. Suppose that <a(u) is defined for 0 ^ u, co(0) = 0, 
co (u) is continuous and strictly increasing, and 

(1.1.4) f“ [&>(w)] -1 du oo as S i 0. 

If m G C[ 0, a], then 

(1.1.5) 0 ^ f(t) ^ f o>[f(s)] ds implies f = 0. 

Here we introduce 

g(t) = max f(s) 

and note that f(t ) ^ g(t) while for each t there is a t\ ^ t such that 

f(ti) - g(t). 9 

We may assume that gif) > 0 for t > 0. It is then nondecreas¬ 
ing and so is co[</(0]. Then 

f(l) ^ g(t) = f(t 0 ^ / 0 '‘ »[/(»)] ds 

S f 0 ‘ o>[f(s)] ds ^ J Q ‘ w[fif(s)] ds, 
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SO that 

9(0 Z j* *[g(s)] ds rn (7(0, 


where G(t) is strictly increasing. We have now 


so that 


and 


G'(t) = «[?(<)] ^ «[<?(«)], 
G'(t) 


»[<?(*)] 

(<) dt 

irn] 


^ i, 


rcr 

Js 4 < 


< a. 


On the other hand, G(t ) is strictly increasing and differentiable with 
a continuous derivative, so if we set u = G(t), we get 


r G(a) du 

JG(s) w(w) 


< a. 


Since G(5) —> 0 with 5, this is a contradiction and g(t) = f(t) = 0. 

1.2. Maximal Elements. We turn now to inequalities that are 
restrictive and usually not determinative. Here the natural prob¬ 
lem is to determine the subset F Q of F on which the inequality is 
valid. In the general case very little seems to be known about t his 
problem and the best we can do is to find some properties of / that 
are implied by the inequality. In the present section we shall 
assume F to be partially ordered and try to find the maximal 
element of Fq under somewhat restrictive assumptions on the 
transformation T of (1.0.1). 

In the following C[0, a] will be the underlying space and F shall 
be the positive cone C + [0, a] of the space. For convenience we 
include the zero element in C + . F is partially ordered under the 
convention that C + . Let I 7 be a transforma¬ 

tion of C[0, a] into itself which is order preserving , that is, 

(1.2.1) /i ^ / 2 implies !T[/i] g T[f 2 ]. 

Such a transformation is obviously positive in the sense that it maps 
C + into itself. We further assume that T is a contraction. 
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C + is a complete metric space under the sup norm. To say that 
T is a contraction implies the existence of a constant k, 0 < k < 1, 
such that 

(1.2.2) \\T[fi]-T[f 2 ]\\ £ fc||/i-/ 2 ||. 

This means that the classical Banach-Cacciopoli fixed-point 
theorem for such mappings applies (see also F. F. Bonsall [89]). 
There is a unique element /o of such that, if / is any element of 
C + , then 

(1.2.3) lim T n [f] = f 0 

n —► * 

and 

(1.2.4) T[f*] - U 
This gives the following proposition. 

Proposition 5, If T is an order preserving contraction on C + [ 0, a] 
to itself of fixed point /o, then 

(1.2.5) / ^ T[f] implies f ^fo. 

For we have 

/ ^ T[f] ^ T\f] ^ ^ T n [f) ^ lim T n [f] = U 

Thus we have found the maximal element of Fq. Ordinarily it 
is not true that 

F 0 = [/|/^/o], 

so we are still far from a complete description of Fo. The inequality 
becomes determinative if and only if /o is the zero element. It 
should be noted that if the first inequality in (1.2.5) is reversed so 
is the second. 

In the theory of differential equations it is customary to convert 
the equation into an integral equation and to use the latter for a 
priori estimates of the rate of growth of the solutions. A basic 
tool is the following proposition which we give essentially in the 
formulation of Bourbaki [8, p. 10]. 
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Proposition 6. Let f and g E C + [0, a], let K(t) > 0, and K(t ) £ 
L(0, a). Then the inequality 

(1.2.6) f(t) f£ git) + f* Kis)f(s) ds 
implies that 

(1.2.7) f(t) f 0 (t) m g(t) + J q g(s)K(s) exp [J* K(u) du ] ds. 

If T[f] is defined by the right member of (1.2.6), we see that 

fo = T[f 0 ] 

and f 0 is the only fixed point of the transformation. This is a linear 
transformation on to itself and hence order preserving. It is a 
contraction if and only if 

f* Kis) ds = k < 1. 

For the general case where this condition is not satisfied, we can 
use the method of integration sketched above. Even in the simplest 
case, K = 1, it does not seem clear that all of F with / ^ / 0 belongs 
to F o. 

Propositions 1 and 2 are special cases of 6. Here the inequality 
is determinative since / 0 is the zero element. The transformations 
T of Propositions 3 and 4 also have the zero element as only fixed 
points. 

The special case where T is a convolution will be discussed in 
Section 1.4. 

1.3. Bounds. Many problems in analysis call for nontrivial 
lower as well as upper bounds. The following is an example of 
such a problem where the bounds are obtained by using a class of 
simultaneously valid restrictive inequalities. 

Consider a linear differential equation 

(1.3.1) W'(t) = A(t)W(t), 

where A and W are n-by-n matrices, more generally elements of a 
complex noncommutative Banach algebra. It is supposed that 
A (t) is continuous for 0 < t ^ a and it is required to find bounds for 
W(t) as t i 0. 
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Proposition 7. Under the stated assumptions 

(1.3.2) ||TT(o)|| exp [- f‘ ||4(«)|| ds] <; ||TF«)|| 

g ||TT(o)|| exp [J° ||A(s)|| ds]. 

This is obtained from 

W(0) = W(a) + J' A(»)W(a) ds, 
valid for 0 < « < j8 g o. Setting 

\\W(s)\\ =/(s), \\A(s)\\ = K(s), 

we obtain 

m ^ /(«) + J* K(s)f(s) ds. 

This is of type (1.2.6) and gives 

m £ /(«) exp [// K(s ) ds]. 

Setting a = t, fi — a, we get the first half of (1.3.2). To get the 
second half we operate with 

f(t) =3/03) + f* K(s)f(s) ds 

and use the method of integration. 

The argument extends immediately to the complex plane. In 
particular we see that ||IF(i)|| is bounded away from 0 and <*>, if 
II-A(Oil Gi(0, a). Taking 

(1.3.3) A(t) = B(t) bounded, 

we get the classical estimates for regular singular points if p = 0 
and for irregular singular points of rank p if p is a positive integer. 

If, instead. W(t) is known to be continuous for a g t < <*>, then 
we replace (1.3.2) by 

(1.3.4) ||TT(o)|| exp [ - / o ‘ ||A(s)|| ds] ^ ||TF(0II 

^ ||W(a)|| exp [ J* || A (s) || ds]. 
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If, in particular, 

(1.3.5) lim ||IF(0|| exp [ /' ||A(s>|| dsl = 0, then W(t ) s 0. 

The following example belongs to a different range of ideas. 
Suppose that P(u) is a polynomial in u of even degree and real 
coefficients such that the equation 

P(u) — u = 0 

has exactly two real roots, rj and r 2 , and that P{u ) — u is positive 
outside the interval (n, r 2 ). Then clearly the inequality 

(1.3.6) /(<) ^ P[/(<)] implies g /(f) g r 2 
for all t, 

A variant of this simple observation was used by Hille [39] in a 
discussion of the asymptotic behavior of the function m(\, a) intro¬ 
duced by H. Weyl in his discussion of singular boundary value 
problems. Stripped of unessential features the inequality can be 
written 

(1.3.7) |P(f)| ^ |3f[F(0]| ^ t [|P(<) - a| 2 

- j | F(t) - a| | F(t) ~b\+~ \F(t) - &| 2 J. 

Here F® is a complex-valued continuous function on [1, 00 ), 
a, b, A, B are real numbers, A > 0, B > 0, (A - 2B)\b — a\ > - 1 . 
From this inequality we conclude that |F(£)| must stay bounded as 
00 I this being the case, lim F(t) must exist and equal a. We 
then set 

(1.3.8) F = a + -, 

t 

and note that the imaginary part of F equals t~ x times the imaginary 
part of G. This gives a new inequality for g = |G|, namely 

(1.3.9) g* - Ag (y + ^ + B* (y - ?Y, 


7 =| b — a . 
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Here the condition (A - 2 B)y + 1 > 0 ensures that the cor¬ 
responding quadratic equations have positive roots bounded away 
from 0 and «. It follows that G(t) is bounded away from 0 and « 
for all t, which was the desired result. 

1.4. Convolution Inequalities . We return to inequalities of the 
type considered in Section 1.2. T is now supposed to be linear, more 
precisely a convolution. 

Let G be a locally compact Abelian additive group and ju a positive 
regular measure of total mass one on G. Let F be the class of real¬ 
valued continuous functions of / on G such that for each x 

(1.4.1) (/ * m)(x) = f g f(x - s ) dn(s) 

converges absolutely. Let Fo be the subclass of F such that on G 

(1.4.2) / - / * M ^ 0. 

It will be supposed that n has no mass at the origin. Let G{n) be 
the closed subgroup of G generated by the elements in the support of 
n. G is normally taken to be either R n or Z n . 

Such inequalities are of some interest to the theory of proba¬ 
bility, especially for renewal theory. This aspect of the question 
has been treated in recent papers by Choquet and Deny [16], 
W. Feller [28], Feller and Orey [29], Hoeffding [42], Karlin [48], 
and Spitzer [74]. The results listed below are in the main taken 
from the Uppsala dissertation of Matts Ess4n [21] which is devoted 
to convolution inequalities per se. 

In the present case ||T|| = 1 so Proposition 5 does not apply and 
the fixed points of the transformation are neither unique nor of 
special importance for the problem at hand. The bounded solu¬ 
tions of 

(1.4.3) / - / * * 

have been found by Choquet and Deny [16]. They are periodic 
functions whose periods contain the elements of the support of 
M . In particular, if G(*t) = G, then any constant is a solution 
of (1.4.3) and there are no other bounded solutions. There may 
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be unbounded solutions, however. Thus if 

(1.4.4) G = G(y) = R\ <«, f:„xdn(x) = 0, 

then any function of the form f = Ax + B, where A, B are arbi¬ 
trary constants, is a solution of (1.4.3). Such solutions are referred 
to as trivial solutions of (1.4.2) in the following. 

Let G = R n or Z n and assume that G(u) = G. 

Proposition 8. If n = 1 and if f g \x\dp(x) < then (1.4.2) 
has a nontrivial hounded solution if and only if J g x dp(x) ^ 0. 

If n — 2 and if j Q \x\ 1+a dp(x) < oo for some positive a, then a 

sufficient condition for the existence of solutions is the existence of a 
linear function l(x) = ax 1 + bx 2 such that 

j g l(x) dfx(x) 9^ 0 

and this condition is also necessary if a = 1 is admissible. 

If n ^ 3 nontrivial solutions always exist. 

Ess4n proves the following properties of his solutions. 

Proposition 9. Let G = G(u) = R 1 and assume that 

(1.4.5) J_^ |a;| dy(x) < °o, x dy(x) = m ^ 0. 

If a bounded function f satisfies (1.4.2) then 

C 1 - 4 -®) L\R 1 ), 

If f is also slowly decreasing, then f(x) tends to limits as x -> ± co and 
(1.4.7) f(co) -/(- oo) = 1 [ [/ _ / * „](*) dXt 

If the left member is zero , then f satisfies (1.4.3). 

These results may be sharpened if p satisfies more restrictive 
conditions. We state the results for n = 1, G = R\ 
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Proposition 10. Let G = G(p.) — R 1 and let there be an a, 0 ^ 
a ^ 1, such that 

1*1 1+ “dn(x) < eo, 

while 

J_ ao x dp(x) = 0. 

Let f be a solution of (1.4.2). If f(x) = 0(|:r| a ), then f — / * p E 
LIf instead f(x) = o(|s|“), then f satisfies (1.4.3). If a — 1 
is admissible and if f(x)/x is bounded for |rc| > 1 and slowly oscil¬ 
lating , then fix)/x tends to finite limits when x—*±& and 

(1 . 4 .8) u m n ± & =-i.->r 

X <T J 

x 2 dp(x). 

Here the terms “slowly decreasing” and “slowly oscillating” 
have their usual significance in the theory of Tauberian theorems, 
and the proofs of limit results are based on the Tauberian theorem of 
Norbert Wiener [83]. 

1.5. Some Function Theoretical Inequalities. Analytic function 
theory presents many inequalities which are restrictive in the sense 
that they imply stronger inequalities which must be satisfied by the 
function. Elsewhere [37, I, p. 200] I have called attention to 
the principle of moderation which underlies the behavior of analytic 
functions. It implies that a strong local constraint imposed on 
such a function either becomes a global constraint or the function 
must compensate by being strongly unbounded. 

The principle of the maximum is a case in point: If f(z) is holo- 
morphic in a bounded domain D and if 

(1.5.1) lim suf> |/(z)| ^ M 

whenever z approaches a point of dD , then \f(z)\ ^ M everywhere in 
D. The condition becomes determinative if we add the constraint 

/(* o) = Me * 



a real, z 0 E L>, for then f(z) m Me xa . 
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Here there is no alternative. But if we drop the assumption that 
D is bounded and assume that (1.5.1) holds merely for approach 
to finite boundary points, then either \f(z)\ < M in D or f(z) is 
unbounded. 

The theorem of Liouville may also be regarded as a restrictive 
inequality: If f(z) is entire and if 

(1.5.2) = max \f(re ie )\ = o(r n+1 ), 

6 

then f(z) is a polynomial of degree Here we can replace 

(1.5.2) for instance by 

(1.5.3) Mt(r,f) = [£ £ \f(re i9 )\ 2 dtj* = o(r n+1 ). 

A deeper result is that due to M. H. Stone [75] who showed that the 
same conclusion is valid if there exists an non-negative trigono¬ 
metric polynomial P(0) such that 

(1.5.4) P(0)\f(re i9 )\ . o(r n+1 ), 

uniformly in 6. For a proof see also Hille and Phillips [41]. 

It is not our intention to make an inventory of restrictive inequali¬ 
ties in complex function theory; however, we shall mention some 
which govern the behavior of functions holomorphic in the right 
half-plane. Suppose that 

(1.5.5) lim sup | f(x + iy)\ ^ 1 

x J.0 

for all y. Under what further restrictions can we conclude that 
\f(z) | ^ 1 in the right-half-plane? In 1904 Phragm6n [67] showed 
that 

(1.5.6) lim r^ 1 log M(r;f) = 0 

r—►« 

is a necessary and sufficient condition. This was generalized four 
years later by Phragm^n in collaboration with Lindelof [68], but we 
shall not digress on these results. We proceed to the next mile- 
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stone, the work of F. and R. Nevanlinna [63]. They introduced the 
function 

(1.5.7) m{r) f) = log+ \f(re id )\ cos 6 dO 

and proved 

Proposition 11. If (1.5.6) holds, then either |/(z)| g 1 or 
(1*5.8) lim inf r~ l m(r;f) > 0. 

This was in 1922. In 1937 Ahlfors [4] proved that r~~ l m(r\f) is 
nondecreasing so that “lim inf” can be replaced by “lim.” This 
limit may be + 00 however. In 1946 M. Heins [35] gave a simpler 
proof, and he also proved the two following propositions. 

Proposition 12. With 0 ^ X ^ oo 

(1-5.9) lim r —1 log M(r;f) = X 

exists. If 0 < X < oo , then either 

(1*5.10) log M(r;f) < Xr 

for all r or f(z) = Ce* z with |C| = 1. 

Proposition IS. The limits 

(1.5.11) lim r~ l log M{r)f) = X, lim r~ l m{r\f) = v 

are cofinite and if finite then 

(1*5.12) „ - 

We also mention another result of Heins [36], later extended by 
B. Kjellberg [49] in whose formulation it reads: 

Proposition H. Let f(z) be a nonconstant entire function and let 
0 < a < 1. Then either the logarithm of the minimum modulus 
exceeds cos oar log M (r;f) for arbitrarily large values of r or 

(1.5.13) g a (r ) = r~ a log M(r;f) 

tends to a positive limit or to + oo when r °o. 
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The ease a = £ was that considered by Heins. The point of main 
interest to us here is a convolution inequality used by Kjellberg in 
his proof. He found a positive kernel K a (s } r ) with 

/ 0 “ K tt (s, r)ds** 1 

such that 

(1.5.14) g a (r) < j Q g a (s)K a (s } r) ds 

and from this he could conclude the existence of lim g a (r). 

With a suitable change of variables, this inequality can be reduced 
to the type considered by Matts Ess6n (see Section 1.4) and Kj ell- 
berg^ theorem becomes a consequence of Proposition 9. This was 
pointed out by Ess6n in [22]. 

1.6. Functional Inequalities on Product Spaces . We shall be con¬ 
cerned with some inequalities of the general type 

(1.6.1) f[F(x } y)]^H[f(x),f(y)}, 

where F and H are given functions. Here H is a function on 
R 1 X R 1 to R\ In general, x and y are points in a locally compact 
space S and / is a mapping on S to R 1 while F is a mapping of 
S X S to J2 1 . In the most important applications, S is either R 1 or 
R+ 1 , the set of positive numbers. We shall consider some par¬ 
ticular cases. 

I. Convex functions . Here S = R l , 

F(x, y) = i(x + y), v) = + v), 

so the inequality becomes 

(1.6.2) / (^y 1 ) ^ l [/(*) + Kv)l 

This inequality, first considered by O. Holder [44] and in more detail 
by Jensen [46] in 1906, characterizes convex functions. Geo¬ 
metrically, (1.6.2) means that the middle point of any chord of the 
curve y — fix) lies above or on the curve. If / is bounded above and 
convex in some interval (a, 5), then / is continuous in (a, b) and has 
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continuous left- and right-hand derivatives everywhere and these 
derivatives are increasing functions of x. If / is twice differentia¬ 
ble, then /"(z) ^ 0 is necessary and sufficient for / to be convex. 
We refer the reader to Hardy, Littlewood, and P6Iya [34] for further 
information. 

II. Subadditive functions. Here & = R l or R+ l > 

F(x, y) = x + y, H(u, v) = u + v, 
and the inequality is 

(!-6.3) f(x + y)£ f(x) + f(y). 

The case where / is also positive-homogeneous goes back to H. 
Minkowski [58] in 1896; the general theory was developed by Hille 
and extensions to R n by R. A. Rosenbaum [72]. If / is measurable 
and different from + <x> f then / is bounded above on any compact 
set K of S. If / is also different from — «>, then / is bounded on any 
K. Any decreasing function satisfies (1.6.3) in R+ 1 . It follows 
that if S = 22+ 1 , then / can tend arbitrarily fast to + oo as x j 0 
and arbitrarily fast to—« If S = R 1 and / is finite, 

then 

(1.6.4) - oo < a = lim ^ = sup^ ^ inf ^ 

x X <0 X 3>0 X 

- Bm ® - t> < .. 

x —* * X 

Subadditive functions play a basic role in the analytical theory 
of semigroups. For further details see Hille and Phillips [41]. 
We add that any modulus of continuity is subadditive. 

C< T. Ionescu Tulcea [45] has considered an important generaliza¬ 
tion of (1.6.3), namely 

C 1 * 6 * 5 ) Six + y) £ g(x) + h(y). 

If /> g y h cannot take the value + °o and if A is measurable, then / is 
bounded above on compact sets. 

III. Suboperative functions . Here S is arbitrary, H — u + v, 
and F(x, y) — x ° y y an associative composition defined on S X S. 
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Thus the inequality is 

(1.6.6) f(x o y) <, f(x) + f(y), 

A particular case occurs in Hille-Phillips [41]; the general theory 
with many extensions is due to C. T. Ionescu Tulcea [45] who also 
considered the inequality 

(1.6.7) f{x ° y) ^ g(x) + h(y). 

These functions play a basic role in the theory of Lie semigroups 
of operators. 

IV. Nan-Archimedean valuations . S is in general a field or 
a ring, but we shall restrict ourselves to $ = R 1 , R f 1 , or Z+ 1 , 
where Z+ l denotes the set of positive integers. Here 

F(x } y) = x + y, H(u , v ) = max (u } v), 

and the inequality is 

(1.6.8) f(x + y) ^ max [f(z),f(y)]. 

If S is R+ 1 or Z + x , then any decreasing function of x satisfies 

(1.6.8) . This means that in JK+ 1 the function / may tend arbi¬ 
trarily fast to ~\~ 00 when x i 0 and arbitrarily fast to — when 
s-* + 00 . It is bounded above on compact subsets of R 1 if 

/ 7^ + 00 . 

Besides its basic role in valuation theory (see, for instance, 
Albert [5]), this inequality plays a role for the so-called CebySev 
constant. See Section 4.2 of this chapter. 

There are obvious extensions to inequalities of the form 

(1.6.9) fix o y) ^ max [/(a;), fiy)] 
which may not have been explored. 

FUNCTIONAL EQUATIONS 

2.0. Orientation . Functional equations have fascinated mathe¬ 
maticians at least since the days of Euler who encountered such 
equations when he introduced homogeneous functions. They have 
a tendency of arising in the most varied connections: parallelogram 




Topics in Classical Analysis 19 


of forces (<TAlembert, Poisson, Cauchy), associativity (Abel), 
non-Euclidean geometry (Bolyai and Lobatdevski), optics (Stokes), 
determinants and addition theorems (Weierstrass), and so on. 

What is to be meant by a functional equation is still debatable 
even if we agree to exclude differential, difference, and integral 
equations. The classification problem presents many difficulties. 
It used to be said that every functional equation requires its own 
mode of attack. In recent years the situation has improved. 
Gradually more general results are available, the classically known 
solutions are shown to be valid under less severe restrictions, exist¬ 
ence proofs applicable to a wide range of equations are being 
found, and uniqueness questions can be settled. 

For a general account of the theory and a number of its applica¬ 
tions, the reader is referred to the excellent monograph by J. Acz61 
[2]. The following pages are intended to call attention to some 
recent results. Again the choice of material is arbitrary and reflects 
the author’s interests. Much of it is inspired by the conferences on 
functional equations held in Oberwolfach in 1962 and 1963. 

There are four sections: Measurability and Continuity , A General 
Method , Uniqueness Theorems , and Miscellanea . 

2.1. Measurability and Continuity . In the early work on func¬ 
tional equations the assumption was usually made, implicitly or 
explicitly, that the solution be a continuous function. Thus, 
in his discussion of the functional equation 

(2.1.1) f(x + y) =/(*) +f(y), 

Cauchy [13] in 1821 assumed / to be continuous everywhere and 
could then show that 

f(x) = cx, 

where c is an arbitrary constant. Under the same assumption he 
could also solve the other three equations which are associated with 
his name: 

(2.1.2) f(xy) = f(x) + f(y), f(x + y) = f(x)f(y), 

f(xy) = f(x)f(y). 

Darboux [17] in 1875 showed for (2.1.1) that continuity everywhere 
could be replaced by continuity at a single point. Gradually other 
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conditions were found: onesided boundedness in an interval and 
finally in a set of positive Lebesgue measure. The latter condition 
is essentially necessary since Hamel [33] in 1905 succeeded in show¬ 
ing the existence of nonmeasurable solutions bounded in no set of 
positive measure. 

If we assume / to be integrable Lebesgue, say in the interval 
(0, <o), then (2.1.1) gives 

fix) = f* fix + y) dy - fiy) dy 

(2.1.3) ,.+ 1 r i 

= l m ds - / Q fis) ds, 

so that / is necessarily continuous and hence also differentiable. 
Thus Cauchy's solution is the only integrable solution. 

This is a fairly typical situation: measurability implies continuity . 
Among recent results along such lines let us quote one due to C. T. 
Ionescu Tulcea [45]. 

We need some definitions and notation. Let S be a locally com¬ 
pact space on which is defined a positive Radon measure y satis¬ 
fying certain mild restrictions. There is a composition law (s, t ) 
—> s o t defined for (s, t) € D C S X S and this mapping is con¬ 
tinuous in D. Let &/, S g , S h be three Hausdorff spaces and let 
G be a mapping of S g X S h into £/, continuous on every compact 
set K C S g X S h . 

Proposition 15. Let f, g, h be three mappings of S into S /, S g) Sh, 
respectively . Suppose g and h are y-measurable and that for every 
(s, 0 G D 

(2.1.4) f(s o t) = G[g(s) } h(s)]. 

Then f is continuous on S. 

This theorem applies to a large number of functional equations 
commonly considered. 

We saw earlier that any continuous solution of (2.1.1) is also 
differentiable and it obviously admits of derivatives of all orders. 
This does not imply, however, that in the complex plane a con¬ 
tinuous solution of 


/(Zi + Z 2 ) = f(zi) + f(zi) 
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is necessarily analytic. Indeed 

f(x + iy) = ax + by 

is a continuous solution and this is analytic if and only if fc = ia. 
The same phenomenon holds for other equations of the type 

(2.1.5) /(*!•*!> 

On the other hand, if G{u , v) is a symmetric analytic function of u , 
v and if the rule of composition is symmetric and analytic, then 
analytic solutions may be expected. This does not seem to have 
been proved in full generality, however. 

Let us add some remarks pertaining to the case 

(2.1.6) f(* + y) =G[f(x),f(y)] 

with a symmetric function <7, holomorphic in some domain D = 
Do X Dq in C 2 = C X C. Let a be a root of the equation 

(2.1.7) (7(a, o) == a, a £ Dq. 

We exclude the case in which this equation is an identity or has 
no solutions in Dq. Equation (2.1.6) has constant solutions, 
f(x) = a } where a is any root of (2.1.7). Suppose now that f(x) 
is a solution of (2.1.6), defined and continuous in some interval 
[0, co] with /(0) = a, f(x) ^ a. Suppose that f(x) £ D 0 and that 

(2.1.8) G u '(a,a) - 1. 

Then it may be shown that f(x) has derivatives of all orders in 
[0, <a] and the higher right-hand derivatives at x = 0 are uniquely 
determined by /(0) and /'(0). This follows from results due to 
Dunford and Hille [19] where the theorem is proved assuming that 
f(x) has values in a Banach algebra. It follows that if / and g 
are two continuous solutions of (2.1.6) and if /(0) = ^(0), /'(0) = 
Q f { 0), then / and g are identical in [0, co]. This does not exclude the 
possibility that these solutions may be extended to the complex 
plane in two different ways, both extensions satisfying (2.1.6). 
At most one of these extensions can be analytic. 

The case where G is algebraic and x ° y — x + y has been exam¬ 
ined in great detail. Here the analytic solutions are known since 
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Weierstrass. Continuous solutions of a real variable have been 
studied by J. F. Ritt [71], P. Montel [59], and P. J. Myrberg [60]. 
Here it is found that any continuous solution is piecewise analytic. 
If f(x) is a continuous solution, then there is an analytic function 
g(z) which is algebraic in z or e az or $(bz) and constants a* such that 
in the kth. interval f(x) = g(x + «*)• Myrberg has indicated that 
the results extend to associative composition rules. 

2.2. A General Method . E. Vincze [80] has introduced a method of 
solving functional equations applicable to a rather large class of 
equations. It is based essentially on the notion of linear depend¬ 
ence and its expression in terms of identities involving deter¬ 
minants. Let 

(2.2.1) A[Fi(zi) f F 2 {z 2 )y . . . ,F n (z n )] = A[FiyF 2) . . . , F n ] 
denote the determinant 


F i(zi) 

F 2 (zi) ■ ' 

• F n (z{) 

Fi(z 2 ) 

F 2 (z 2 ) • • 

■ F n (z 2 ) 

Fi(z n ) 

F 2 (z n ) 

• F n (z n ) 


Here the functions are defined in some arbitrary set S of the com¬ 
plex plane. Suppose that the n(k + 1) functions F %(«),... , 
Fkn+n(z ) satisfy the identity 

h 

(2.2.2) £ A[F m+1 (zi), F m+2 (z 2 ), . . . , F m+n (z n )] = 0 

V”0 

for all zi, z 2> ... y z n in S. Let F 0 (z) be an arbitrary function. 
Then the equation 

k 

(2.2.3) ^ A[F m +i(z{)y F 2 (^ 2 )> . . . y Fyn+n(Zn)> F o(2 n +l)] " 0 

also holds for all values of zi, z 2 ... , z n , z n+ 1 in S . Vincze calls 
the passage from the first equation to the second “extension by 
Fo(z)” It is the basic tool in his discussion. 

Let us further note that Fi,F 2 , . . . , F n are linearly independent 
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in S if and only if 

(2.2.4) A[Fi(zi), F 2 ( 22 ), . . . , PM] - 0 
for all 21 , 22 , . . . , 2 n in S. 

Vincze illustrates his method on a number of examples. The 
following is one of the simplest. It is desired to find functions 
F, G, H, K, L satisfying the equation 

(2.2.5) F(z x o z 2 ) = G(z x ) + H(z 2 ) + K(zx)L(z 2 ). 

Here z i, z 2 , zx ° z 2 belong to a set & which is an Abelian semigroup 
with respect to the operation In particular, it is assumed that 

there is a point a£S such that the equation a ° z = c has at least 
one solution for every given c. 

Since the left side of (2.2.5) is symmetric in zx and z 2 , 

G(zx ) + H(z 2 ) + K(zx)L(z 2 ) = (?( 2 2 ) + H(zx) + K(z 2 )L(z x ) 

or, in determinant notation, 

(2.2.6) A(<?, 1) + A(l, H) + A (K, L ) = 0. 

Here we “extend by 1” and obtain 

A (G, 1, 1) + A(l, H, 1) + A (K, L, 1) = 0. 

Since the first two determinants are zero, we are left with 

(2.2.7) A (K, L, 1) = 0, 

that is, the functions K, L, 1 are linearly dependent. There are 
two possibilities, either 

(2.2.8) A: L(z) m fix, or B: K(z) = fix + fiJL(z). 

We shall carry through the discussion of case A. Substituting 
in (2.2.6) we obtain 

A (G, 1) + A(l, H) + A (K, fix) 

= A(<7, 1) - A (H, 1) + A (fixK, 1) 

= A (G-H + fixK, 1) = 0. 
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Thus there is a constant 02 such that 

GW - H(z) + hK(z) = 20 2 . 

Combining we get 

(2.2.9) F(zi ° Z 2 ) — [(?(zi) + 0}K(z\) — 02] 

+ [GW) + PiK(z2) — 02]. 

This is a so-called Pexider equation after H. W. Pexider [66], who 
considered such equations in 1903. 

The equation 

(2.2.10) f(z 1 o Z2 ) «/W) +f(z 2 ) 

is a generalized Cauchy equation and reduces to one of the classical 
equations if the operation ° is either addition or multiplication. 
Suppose that we can solve (2.2.10) and that f(z) is a solution. Then 
we can also solve (2.2.9) by setting 

F(z) — f(z) + 20s , G(z ) = f(z) — 0iK(z) + 0 2 + 03. 

Thus the final solution in case A is 

F(z) — f(z) 4* 20s, G(z) = f(z) — 0iK(z) + 02 + 03, 

H(z) = f(z) + 03 — 02, K(z) arbitrary, L(z) = 0i. 

Case B is more complicated and involves also the solution of 
another generalized Cauchy equation 

(2.2.11) M*i°z 2 ) = h(zx)h(z 2 ). 

Aside from this no new principle is involved. 

Vincze has also applied his method to generalized sine and cosine 
equations where addition is replaced by an associative operation 
The method appears to be quite powerful. 

2.3. Uniqueness Theorems . A simple uniqueness theorem has 
recently been found by Acz61 [3]. 

Proposition 16. Let H(u, v , x , y ) he strictly monotone with respect 
to u (or v). Let F he continuous for all (x } y) in (A, B) X ( A , B) 
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where (A, B) is an arbitrary interval . Let 

(2.3.1) x < F(x f y) < y if A < x < y < B. 

Then the functional equation 

(2.3.2) V)\ = H[f(x),f(y),x, y] 

has at most one continuous solution defined in {A, B) and satisfying 
the initial conditions 

(2.3.3) f(a) - c, f(b) = d, A < a < b < B. 

The proof is so simple and elegant that we shall sketch it here. 
Suppose that there are two distinct continuous solutions fi and f 2 
and consider first the interval [a, 6J. If there is an x\ £ (a, b) 
where/i(xi) j±f 2 {x i), set 

C = max [x | a < x < Xi, fi(x) = / 2 (x)], 

D = min [x \ x y < x < b, fi(x) — fzix)]. 

Then fi(C) - f t (C ), fi(D) = / a (D) and C < D. Now 

fi[F(C, D)} = HlhiWJxiD), C, D] 

= H[f 2 (C) } f 2 (D), C, D] = h[F{C y D)}. 

Here C < F(C , D) < D by (2.3.1) and this shows the existence of 
an xo between C and D where fi(xo) — f 2 (x o). This contradicts the 
definition of C and D and shows that/i(z) = f 2 {x) in [a, b]. 

Next we consider the interval (6, B). Let 

E = max [x \ a ^ x < B, fi(x) = f 2 (x)). 

If E < B y there is an interval ( E , £7 + e) where f\ ^ f 2 . By (2.3.1) 
and the continuity of F f we can find an x 2 in this interval such that 

a < F(x 2 , a) < E. 

We have then on the one hand 

fi[F(x 2j a)] = H[/i(x 2 ),/i(a), x 2 , a] 

^ H[f 2 (x 2 ), f 2 (a), x 2 , a] = h[F(x 2} a)], 
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since H is strictly monotone in the first argument and /i(i 2 ) ^ 
fz(xi). On the other hand, 

fi[F(x 2 , a)] = fi[F(xa, a)], 

since fi and / 2 are equal in [o, E\. This contradiction shows that 
E = B, that is, /j = / 2 also in (b , B). In the same manner the 
remaining interval {A, a) is handled. 

It follows that the solution, if it exists and is continuous, is 
determined uniquely by the initial conditions (2.3.3). 

Among the equations to which this argument applies is Jensen’s 
equation 

(2.3.4) f(^~-) = ~lf(x)+f(y)]. 

Compare the inequality (1.6.2) for convex functions. Now (2.3.4) 
is obviously satisfied by a linear function 

(2.3.5) f(x) = ax + 1 3, 

and such a function can be made to satisfy two conditions of type 
(2.3.2) which determine a and fi uniquely. By Proposition 16 this is 
then the only continuous solution. 

More sophisticated examples can be found in the theory of aver¬ 
ages and in information theory (Acz41, loc cit.). It will mentioned 
later in Section 3.1 that each A-average corresponds to a continuous 
strictly monotone function F in terms of which the average is 
expressible. It is natural to ask if this function is uniquely deter¬ 
mined. If we restrict ourselves to two variables, but generalize the 
question slightly, we are led to the functional equation 

(2.3.6) f[F(x, y )] = pf - ^ } + ^ (y) , 

V + ? 

where p and q are arbitrary positive numbers and 


(2.3.7) 


F( x ,y)- S (m±jm\ 

\ P + q / 
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Here g is a given continuous strictly monotone function and g 
denotes its inverse. This equation is of type (2.3.2). It is not 
difficult to see that a solution is given by 


(2.3.8) 


/Or) - ctg(x) + 0, 


where a. and f$ are arbitrary constants. Here again a condition of 
type (2.3.3) determines the constants uniquely and Proposition 16 
shows that this is the only continuous solution taking on the pre¬ 
scribed initial values. It is clear that f(x) is also strictly monotone 
so we have 

(239) jf pffr) + g fM\ _ - / pgfr) + w(y) \ 

\ p + q ) 9 \ p+q ) 

A similar equation occurs in information theory, namely 
\/xg(x) + yg(y)\] xf(x) + yf(y) 


(2.3.10) , 


x + y 


This is also of type (2.3.2) and here again the solution is of type 
(2.3.8). 

Condition (2.3.1) is obviously rather restrictive and it is not satis¬ 
fied by addition theorems. I am indebted to Professor Acz61 for 
the information that he, in collaboration with M. Hosszu, has suc¬ 
ceeded in replacing this condition by less restrictive assumptions. 
Thus, for the important case 


(2.3.11) 


f[F(x, y)) = H[f(x)J(y)] } 


there can be at most one continuous solution satisfying (2.3.3), if F 
is continuous and F(x, y) and H(u, t>) are both strictly increasing 
(strictly decreasing) with respect to both variables involved. 

2.4. Miscellanea. In this section we discuss briefly some questions 
involving functional equations which were discussed at the two 
Oberwolfach conferences. The first two questions deal with Abel's 
equation and are based on communications by Helmuth Kneser and 
A. Ostrowski, respectively. 

The functional equation 


(2.4.1) 


F[f(x)] = F(x) + 1, 
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where / is a given function was studied by Abel [1], posthumous 
paper. It has been the object of much research. W. Schobe [73] 
has given a solution of the special case 

(2.4.2) F(e x ) = F(x) + 1 
using a double iteration process. Set 

L(x) = log (1 + x ), e(x) = e x , 

and define 

L n (x) - L[L*~\x)] y e n (x) = e[e n - l (x)], n > 1. 

Set 

(2.4.3) OM-Jta |i4j-” + ^ 0g ”!' 

(2.4.4) A(*) = lim L n [e n {x)]. 

n—» «> 

The second limit exists for real values of x since the sequence 
involved is strictly increasing and bounded on compact sets. It 
is not known if h(x) is analytic. The function g on the other hand 
is analytic. These functions satisfy the relations 

(2.4.5) g[L(x)] = g(x) + 1, h(x) = L[h(e(x))], 
and 

(2.4.6) F(x) - 

is a solution of (2.4.2). It is known that this equation has analytic 
solutions but it is not known if (2.4.6) defines such a solution. 

Abehs equation (2.4.1) plays a role in the theory of convergence 
of infinite series, more precisely for the convergence criteria of V. P. 
Ermakov. In this first paper of 1871, Ermakov considered series of 
the form 

(2.4.7) 2 F{n), 

where F(x) is positive, continuous, and strictly decreasing. He 
showed that if a continuously differentiable function / could be 
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found such that f(x) > x , f(x) > 0, and 


(2.4.8) 


F[m]m 

Fix) 


g g < l, 


then the series converges, whereas if the quotient is ^ 1 it diverges. 
In a later paper of 1883, he tried to eliminate the condition that 
F(x) be monotone, though not with complete success. Correct 
conditions were proved by A. Ostrowski [65] in 1956. The main 
condition if that/'(x) should be monotone, and if increasing it should 
attain or surpass unity. Following Ermakov he based the proof 
on the construction of a solution of Abel's equation with certain 
special properties. 

At the 1962 conference W. Eichhorn (Wurzburg) made an inter¬ 
esting application of the theory of hypercomplex numbers to a 
system of functional equations (see [90]). Let 

n n 

(2.4.9) Fi(s + <) = y T yi ik Fiis)F k {t), i = 1, 2, . . . , n. 

l 

He constructs an algebra A with basis elements e 1 , e 2 , . . . , e n 
and the multiplication table 

n 

(2.4.10) e j e k = £ 7 /V, j, k «= 1 , 2, . . . , n. 

i = 1 

Depending upon the “structure constants” y/ k } A may be neither 
associative nor commutative. Multiplying the ith equation under 
(2.4.9) by e i and summing for i, we get 

(2.4.11) F(s + t) - F(s)F(t), F = (F X) F 2) . . . , F n ). 

If F is continuous and differentiable for all finite 5 , there exists an 
idempotent b £ A, b 2 = b, and ana£4 such that ba k = a k b — a k 
for all natural numbers k and 

(2.4.12) F(s) = b exp (as). 

At the 1962 conference O. Taussky-Todd (Pasadena) raised the 
question of the solution of Cauchy's equation for quaternions or 
for Cayley numbers. At the 1963 conference some results of 
I. Makai (Debrecen) were announced which have a bearing on this 
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problem. If lower-case letters denote scalars and capitals qua¬ 
ternions, then the two equations 

(2.4.13) f(XY) = f(X)f(Y), F(xy) = F(x)F(y) 
have as general solutions 

(2.4.14) f(X) = m(\X\), F(x) = m(a:)[cos a(x) + J sin «(*)], 
where 

m(xy) = m(x)m(y), a>{xy) s o(x) + u(y) (mod 2 r), 

and J is any fixed, unit quaternion with scalar part 0. 

This finishes our survey of current events in the theory of func¬ 
tional equations. 

MEAN VALUES 

3.0. Orientation, Since Euclid proved that the geometric mean 
of two positive quantities cannot exceed their arithmetic mean, we 
can claim that considerations of mean values go back to antiquity. 
We have to wait until Cauchy and Gauss, however, for more sophis¬ 
ticated results, and the theory started to flourish only after L. Fej6r 
[25] had shown in 1900 the importance of mean values for the theory 
of Fourier series. The study of special means, initiated by G. 
Frobenius [31] in 1880, O. Holder [43] in 1882, and E. Ces&ro [14] in 
1890, ultimately led to a profusion of averaging processes applicable 
to the various needs of analysis. 

Postulates for arithmetic means were given by R. Schimmack [87] 
in 1909 and these and other postulates were examined for logical 
independence by E. V. Huntington [85] in 1927 (references kindly 
supplied by J. B. Diaz and F. T. Metcalf). Complete determina¬ 
tion of the mean values satisfying a given set of postulates was given 
in 1930 by A. N. Kolmogoroff [52] and M. Nagumo [62] and in 1931 
by B. de Finetti [30]. This class of mean values will be discussed in 
the following and other applications will be found later in this 
report. It is by no means the only class worthy of study, but its 
general usefulness is unsurpassed. 

There are three sections: The Postulates , Consequences of the 
Postulates , and Remarks on Limitation . 
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3.1 The Postulates. We consider an average subject to the fol¬ 
lowing conditions. 

(i) For each n and each set of n positive numbers xi, x 2i . . . , x n 
there is a positive number A(x 1 , x 2 , ... f x n ) called the A-average 
of these numbers . 

(ii) A(x\ , x 2y ... } x n ) is a continuous symmetric function of its 
arguments, strictly increasing in each of them. 

(iii) A (x, x, . . . , x) — x, 

(iv) If y = A(x h x 2 , ... , x k )y then 

A(x h X 2y ... , X kj Zfc+i, . . . , x n ) 

= My, y, . * • > y> x k +i, . . . , x n ), 


where y is repeated k times. 

Here condition (iv) can be regarded as a class of functional equa¬ 
tions satisfied by A. For n = 3, fc = 2 this equation takes the form 

(3.1.1) A(x i, x 2 , x 3 ) — A[A(x h x 2 ), A(x h x 2 ), x 3 ]. 

These various conditions naturally do not determine A in any 
way uniquely, but we have the following: 

Proposition 17. To each A-average corresponds a strictly monotone 
function F, defined and continuous for 0 < u < <*> f such that 

(3.1.2) A(xi,x 2 , . . . ,x n ) = F 0 ^ 

3=1 

where F is the inverse of F. 

Cf. remarks in Section 2.3 concerning the uniqueness of the 
association A F. 

Additional conditions imposed on A tend to restrict F still fur¬ 
ther. Thus, if F is to be homogeneous of degree 1, 

(v) A(tx 1 , tx 2 , ... , tx n ) = tA(x 1 , X 2 , ... , Xn), 
then F(u ) is either 

I 

u a or log u 

with a real. If A is translation covariant 

(vi) A(x i + a, Xi + a, . . . , x n + a) = A(x 1 , x 2 , ... , x n ) + a, 
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then F(u ) is either 

e au , a 9± 0, or u . 

3.2. Consequences of the Postulates . Without using Proposition 17 
one can prove a number of important properties of the averages 
directly from the postulates. We have 

Proposition 18. Unless all the x’s are equal 

(3.2.1) minxj < A(x ly x 2y . . . ,x n ) < maxxy. 

This follows from (ii) + (hi). 

Proposition 19. The average of h sets x h x 2y . . . , x n equals the 
average of one set: 

(3.2.2) A(x i, . . . , xx, Z 2 , • . . , Z 2 , . . . , x n , . . . , x n ) 

A(x i, x 2) . . . y x n ). 

Proof by rearrangement and repeated use of (iv). 

This gives a convenient method of extending and contracting 
averages. The same method leads to the principle of repeated 
averages : 


Proposition 20. From the n numbers x h x 2 , . . . , x n select k and 
form their average. The average of all the averages obtainable in this 
manner equals A(x h x 2y , x n ). 


To prove this we enlarge the given set E so as to obtain 
copies of each element Xj . The enlarged set E * can be separated 
into © distinct subsets each forming a selection of k elements of E. 


Averaging over E or over E* gives the same result. On the other 
hand, in the average over E* we can replace the elements of a subset 
by their average repeated k times. Thus the average over E* equals 

the average of the (^) averages each repeated k times. By Propo¬ 
sition 19 this reduces to the average of the averages as asserted. 
This leads to a generalization of Proposition 18. Let 


Vk.m, m 
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denote the averages of x\ , X 2 , . . . , x n taken & at a time. 
Proposition 21. For each k < n 

(3.2.3) min y k}tn < A(x h x 2 , . . . , x„) < max 

m m 

unless all the x’s are equal. 

The last remark follows from the fact that the y y s are equal if 
and only if the x’s are equal. 

This means that the averaging process is oscillation reducing in 
the following sense. Let S be an infinite bounded set of positive 
numbers. Select n distinct numbers from this set and form their 
A-average. The set of all such averages involving n numbers form 
a set S n . For each n set 

a n = inf S n , A n = sup S n . 

Proposition 22. For all n > 1 

(3.2.4) Un—1 = = A n ^ A n _l, 

so that 

(3.2.5) A n a n = A n —\ a n —i. 

This follows from Proposition 21. 

3.3. Remarks on Limitation. One of the most important applica¬ 
tions of averaging processes is to the summation of infinite series or, 
equivalently, to the limitation of infinite sequences. Here the 
question of preservation of limits is basic. The A~averages are 
limit preserving . This follows from 

Proposition 23. The A-average of k numbers a and n numbers b 
converges to b if n —* oo in such a way that k/n —> 0. 

This is not so easy to derive directly from the postulates but it 
follows readily from Proposition 17. For if the average in question 
is A fc(n , then 
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and since F is continuous and strictly monotone A ktn —► b as 
asserted. 

Proposition 24 . // Urn x n = xo, then 

(3.3.1) lim A(x h x 2f . . . , x n ) = x 0 . 

Suppose that 0 < a < Xj < & for all j and that for a given e > 0 
we have 0 < x 0 — e < Xj < x 0 + e for j > k. Then 

A(a, . . . , ot, xo - €, . . . , xo - e) < 4(xi, . . . , 

•£jcj • * • > *^?i) ^ « • • j ^0 "4“ • • . , Xo ”f~ c). 

The first member converges to xq — e, the third to x 0 + € whence 
the conclusion follows since the extension to the case xo = 0 is 
obvious. 

We can obtain greater power and flexibility of averaging processes 
by composition. This was first observed by Otto Holder [43] in 
1882 for the case of the arithmetic means, F(u) — u. More gen¬ 
erally, suppose that A\ and A 2 are two vt-averages, distinct or not, 
and form 

(3.3.2) [Ai * A 2 \(x 1 , x 2f . . . , x n ) 

A 1 [ A 2 (x 1 ), A 2 (x\j X 2 ), . . . , A 2 (x\ 9 x 2f . • . , Xyj)]• 

We have then the consistency theorem: 

Proposition 25. If { x n } is a sequence such that 

(3.3.3) lim A 2 (x h x 2f . . . , x n ) = x 0 , 
then also 

(3.3.4) lim [Ai • A 2 ](x u x 2 , . . . , x n ) = x 0 . 

This follows from Proposition 24. 

Naturally we have also that 

A i-lim x n = yo implies A 2 Ai-\imx n - y 0 . 

Since in general AiA 2 A 2 A\, we can not conclude that yo —xo, 
that is, A 1 and A 2 need not be consistent. 
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We can clearly resort to compositions involving three or more 
“factors.” Besides the Holder means, not much use has been made 
of this device, one reason being the restriction to positive numbers 
which is undesirable for many applications. 

Let us return to the oscillation-reducing property noticed above. 
This can be sharpened as follows. Let {x n } be a given sequence of 
positive numbers, {A(a;)} the sequence of successive 4-averages. 

£ 2 = lim sup x n 
rj 2 = lim sup A(x), 

then using Propositions 21 and 23 we obtain 
Proposition 26. We have 

(3.3.6) 572 — vi S £2 — £ 1 * 

This classical result has been widely extended to more general 
limiting processes and to complex-valued sequences and functions 
by K. Knopp [50]. His Kernsatz expresses that the convex hull 
of the cluster points of the transformed sequence is contained in the 
convex hull of the cluster points of the original sequence. Formula 

(3.3.6) is clearly a special case. 

Knopp has also devoted much attention to the question of 
inequalities between different averages. See [51] and the literature 
quoted there. For the older literature see Hardy, Littlewood, and 
P61ya [34]. A typical result of Knopp’s is 

Proposition 27. Let F be twice differentiable and F ,f (u) > 0 in 
(0, 00 ). Let A be the average defined by F(u) and M 1 the arithmetic 
mean defined by u. Let x\, x 2) ... } x n lie in (a, 6), 0 < a < b < 
00 . Then 

(3.3.7) 0 ^ F(A) - F(Mi) ^ \F(a) + (1 - \)F(b) 

- F(\a + (1 - A)i>), 


If 


(3.3.5) 


£i = lim inf x n> 

171 = lim inf A(x) } 


X = —fb - g -—) andF'[g(u)] = «. 

D - a ( \ b — a / 


where 
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This does not include the inequality of P. Schweitzer [88] and 
L. V. Kantorovid [47], If Mi is the arithmetic, M^\ the harmonic 
mean of n numbers in [a, 6], then 


(3.3.8) 


M ! ^ (a + b)> 

- ^ - 

M—i 4a6 


For some recent results along these lines see Cargo and Shisha [9]. 
See also Diaz and Metcalf [84] to whom I am indebted for a number 
of references. 


TRANSFINITE DIAMETERS 

4.0. Orientation , In 1923 M. Fekete [26] investigated the dis¬ 
tribution of the roots of a certain class of algebraic equations. He 
found that if the roots are restricted to a bounded closed set E in 
the complex plane, then the asymptotic behavior of the corresponding 
discriminants leads to a significant set function which he called the 
transfinite diameter of E . More precisely, he introduced 

(4.0.1) d 0 (E) = lim max { J] \ z j ~~ **| 2/(n(n ~ 1)) | 

n_>0 ° 1 )<k 

Zi, ■ ■ ■ , Z„ E #}• 

The expression to be maximized is the geometric mean of the dis¬ 
tances between the points zi 9 z 2 > . . . > z n in F. This is the A-aver- 
age corresponding to F(u) = log u in our previous notation. Fekete 
showed in a number of papers [27] that this concept is closely related 
to the CebySev constant of E, the equilibrium potential, and the 
exterior conformal mapping radius. See below. 

In 1931 P61ya and Szego [69] extended the investigation to R z . 
They wanted to keep the connection with potential theory and 
found that this requires replacing the geometric mean by the har¬ 
monic mean based on the function F(u) = u~ l . They also con¬ 
sidered general power means, F{u) = u p , 

A number of further generalizations were introduced by F. Leja 
in a series of papers starting in 1934. See in particular [54, 56] 
and the bibliography in [57]. Leja considered the connections with 
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conformal mapping, various associated sequences of polynomials, 
functions of two complex variables, and general metric spaces. 

In the complex plane other transfinite diameters have been 
defined by Otto Frostman [32] and by M. Tsuji [78] using spherical 
or hyperbolical metrics. 

There are three sections: The Transfinite A-diameter , The CebySev 
Constant , and Special Cases. 

4.1. The Transfinite A-diameter. The following is based on Hille 
[38, 40] with additions and amendments. 

Let X be a complete metric space, E a bounded set in X. Let 
A be an averaging process satisfying postulates (i)-(iv). Let n > 1 
and take n distinct points P u P 2 , . . . , P n of E . Note the 
distances 

8 jk = d(P h P*), 1 ^ j < k g n. 

Here 8 j k ^ 5(F), the topological diameter of E. Form the ^.-aver¬ 
age of these distances, A(5y&), which is also S 5(F). Next we 
determine 

5 n (E) « sup A( 8 jk ) 

when the n points range over E. This leads to a bounded sequence 
{5 n (F)} of positive numbers. This sequence is nonincreasing for 
any admissible A. 

To show this we consider 

A[d(Q j} Q k ) ], l£i<fc£n+l, 

where the points Qj are chosen so that the average exceed) 
$ n+ i(F) — e for a given c > 0. This is an average of £ n(n + Is 
numbers. Let us repeat each distance n — 1 times, thus obtaining 
a set of ^{n — l)n(n + 1) numbers having the same average. We 
arrange these numbers into n + 1 groups of \n(n — 1) numbers 
each such that the distances which do not involve the point Q p are 
placed in the pth group. In forming the average, we can replace 
each element in the pth group by the average ij p of the elements in 
the group. This gives 


Aivij • • • , 171, 1?2, . . . , 172, . . . , Vn- fl, • • • , i?n+l) 

= ^4(?7I, 3/2, , 17n+l). 




88 Lectures on Modem Mathematics 


Hence 

8 n +i(E) — t < A(y i, tj 2 , , »?„+i) S max jjy ^ 8 n (E ) 

so that 

(4.1.1) 8 n+ 1 (E) Z 8 n (E ) 

and 

(4.1.2) lim 8 n (E) m A- 8 0 (E) 

n —*« 

exists. This is by definition the transfinite A-diameter of E . 

This set function has important properties of monotony and 
continuity. The following is obvious: 

Proposition 28 . If E\ C E 2) then 

(4.1.3) 8 o(Ei) ^ 6 0 (E 2 ). 

This is the strongest statement that we can make. In fact, 
it often happens that E and its boundary dE have the same 
A-diameter. 

Proposition 29. Let E e be the set of points of X having a distance 
from E not exceeding e. Then 

(4.1.4) lim 6 0 (E ( ) = 8 o(E). 

*10 

To show this, choose an arbitrary integer n, let 77 > 0 be given 
and find points Q h . . . , Q n in E t whose average distance A (Q) ^ 
8 n (E e ) — 77 . For each point Qj we can find a point Pj £ E such 
that 0 < d(P h Qj) ^ € , Pj ^ P k if j ^ k . Then 

d(Qj, Q k ) ^ d(Pj , P k ) + 2e. 

Thus 

K(E,) - v ^ A[d(Pj, P k ) + 2e]. 

Now the function A(x) is continuous in each of its ?n(n — 1) 
arguments at the point {. . . ,d(Pj,P k ), . . This means 
that a A(c) can be found which goes to 0 with e such that 

A[d(Pj, P k ) + 2e] < A[d(P jy P k )] + A(c) ^ 8 n (E) + A(e). 
Hence 


f>n(E ( ) ^ $n(E) + A(e) + 7}. 
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Here the left member is ^SoCE e ), so we have 
So(EJ ^ $ n (E) + M e ) + 

The left member is a nondecreasing function of e so it tends to a 
limit as e —► 0 and A(c) —> 0. Hence 

Iim So (E t ) ^ l>n(E) + r). 

«10 

for every n. Here we let n —► <x> and obtain 
lim So(^«) ^ So(^) + v 

for every 77 > 0 and hence also for 77 = 0. Since the limit is at 
least 80 (E), formula (4.1.4) follows. 

4.2. The debysev Constant Let E be a bounded closed set in the 
complex plane and let P n (z) be any polynomial of degree n and 
leading term z n . The absolute value of P n (z) has a maximum in E 
which is attained. In other words, the geometric mean of the 
distances from the variable point 2 to the zeros of P n (z) has a maxi¬ 
mum in E . The set of all such maxima for P n , ranging over all 
admissible polynomials, is bounded below and the infimum is 
reached. There is a unique polynomial T n (z), called the nth 
CebySev polynomial for E such that 

(4.2.1) max \ T n (z)\ 1!n — min max |P n (z)| 1/n 25 M n (E). 

z^E Pn zGE 

The classical CebySev constant of E is then by definition 

(4.2.2) X (E) = lim M n (E). 

n —>« 

The existence of this limit follows from the general argument given 
below. Fekete proved that 

(4.2.3) x(E) - $0 (E). 

This construction presupposes that the given set is in the complex 
plane and that the geometric mean is used to form the average. 
But we can carry over the procedure to any complete metric space, 
any bounded set E, and any A-average, The geometric interpre¬ 
tation makes sense in this new setting and we have merely to replace 
“min max” by “inf sup.” The equality (4.2.3) is usually replaced 
by an inequality. 
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We proceed as follows. Take n points P h . . . , P n in the 
space X and form the function 

(4.2.4) f(P) = A[d(P, P,), d(P, P 2 ), . . . , d(P, P n )], 

that is the average of the distance of the variable point P from the 
fixed points Pi, , P n . This function has a supremum on the 
bounded set E. The set of all such suprema corresponding to n 
points Pj (E. X is bounded below and we denote its infimum by 
M n {E). Thus 

(4.2.5) M n (E) = inf sup f(P). 

pee 

If E is compact, we can actually find a function f n (P) which assumes 
the value M n {E) so that “inf sup” becomes “min max.” This 
function need not be uniquely determined. 

The sequence {M n (E)} is convergent. This follows from the 
inequalities 

(4.2.6) M m+n g max (M m , M n ), 

(4.2.7) M n+X g A(M n , . . . ,M n , b), 

where M n occurs n times and b is any number ^ 8(E ). 

Here the first inequality is of type (1.6.8) with S = Z + \ To 
verify it, note that we can find functions 

h(P) = A[d(P, PO, . . . , d(P, P m )], 

/ 2 (P) = A[d(P, Qj), . . . , d(P, Q n )] 

whose suprema on E differ from M m (E) and M n (E), respectively, by 
less than a preassigned e. We then form 

^(P,Pi), • • ■ ,d(P,P m ),d(P,Qi), . . . ,d(P,Q n )} 

whose supremum on E is at least M m+n (E). On the other hand, for 
all P this expression equals 

A lfi(P), . . . Ji(P) y f2{P)> . . . JiiP)] ^ max lfi(P) y f2(P)] 

and on E this is at most equal to the larger of the numbers M m + e , 
M n + e. This proves (4.2.6) and (4.2.7) is proved by a similar 
argument. 

The first inequality shows that M n (E) g Mi{E) for all n so 
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{M n (E)\ is a bounded sequence. Set 

lim sup M n = 0. 

If there is a j and a number y such that 

Mj < y, M j+ 1 < 7 , 

then (4.2.6) shows that < y for all large k (k > j 2 will do). 
It follows that such a number y must be Suppose now that 

there is a j and a 7 such that Mj < 7 < 0* Then by (4.2.6) 
M k < 7 for infinitely many values of fc, for instance for fc = ,/2 m . 
Then (4.2.7) shows that for such a k 

M k + 1 ^ A(Mfc, . . . , ilf*, b) < A( 7 , . . . , 7, &), 

where M k and 7 are each repeated k times. By Proposition 23 
the last member is arbitrarily close to 7 for large values of m. Thus 
we again obtain two consecutive numbers M k and M k +i, both less 
than a fixed number <0 and we have again a contradiction. This 
shows that 

(4.2.8) M n (E) ^ lim sup M k (E) 
for all n and 

(4.2.9) lim M n (E) = x(E) 

exists. 

We have 

(4.2.10) 5o (E) ^ X (E) 
and strict inequality may hold. 

To show this, choose n points Pi, . * . , P n in E such that for a 
given e > 0 

A[d(Pj, P k )] = 8 n (E) - e. 

Then choose a point P n +i G E such that 

A[d{P n+ 1 , Pi), . . . , d(P n + h P n )] M n (E). 

This is possible. Then form the average of all distances between 
the n + 1 points. This is at most equal to 5 n+ x(E). On the other 
hand, the distances form two groups, those involving P n +i and 
those that do not. The average for the first group is *zM n (E), 




42 Lectures on Modem Mathematics 


that for the second is 6 n (E) - € . It follows that 


6 n+1 (E) £ min [S n (E) - e , M„(E)]. 
This must hold for all e so we have 


S n+ i(E) t min [S n (E), M n (E)] - M n (E) 
for all n and (4.2.10) follows. 

Examples for the inequality are found in P6lya and Szego [69], 
Thus, for instance, in R s with E taken as the unit ball and A being 
pth power means, F(u) = u p with -1 < p < 2, we have 

(4.2.11) A-S 0 (E) = 2(1 + £p)~ llp > A-x(E) = 1. 

4.3. Special Cases. For the pth power means P61ya and Szego 
listed transfinite diameters and Cebysev constants for the following 
configurations: two points, a line segment, a circle, a circular disk, 
the surface of a sphere, and the solid sphere. They also indicated 
the nature of the equilibrium distribution on these objects for the 
corresponding potentials. 

The problem of finding the ^4-diameter of the unit sphere in 
a Banach space leads to some interesting results [40]. Let X be 
a Banach space over complex numbers. Let E be a c-star with 
respect to the origin, that is, xo £ £ implies ^Xq £ E for any com¬ 
plex number f with |f| ^ 1. If p is the .4.-diameter of the unit disk 
in the complex plane and if r = sup ||«o|(, x 0 6 E, then 

(4.3.1) A-S 0 [E] ^ pr . 

This holds in particular if E is the unit sphere and here r = 1. 

In any Banach space X with unit sphere U 

(4.3.2) A-b 0 [U] ^ 2, 

and equality holds in a number of cases, in particular for 


X:C[a,b],L\a, b), L°°(a, b), l, m. 

The Lebesgue spaces 1), 1 < p < ao, present some unsolved 
problems. In such a space the elements of the orthonormal system 
[exp (fcttrf)] are equidistant. This gives 


(4.3.3) 


A-6 0 [U^] ^ 2 


£(jp + i) ] 1/p 

m)r(*p + i)J ’ 


1 is p ^ OO. 
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This can be improved to 

(4.3.4) A-a 0 [tf (p) ] ^ 2 1/p , 1 g V ^ 2. 

Consider the Euclidean n space, R n y and its unit ball U n . If a 
certain choice of n unit vectors gives an A-average equal to A- 
&n[U n ], then these vectors also lie in U n ~ l . For in any case their 
endpoints determine a hyperplane the equation of which can be 
taken to be z n = p with 0 ^ p < 1. If the maximizing vectors are 

Vk = (xi,k, • • • , p) f fc == 1, 2, . . . , n, 

and if p > 0, then 

Ule = (1 “ .... 0) 

are unit vectors and 

ll«y ~ «*ll = (1 - p 2 )~ m I \»i ~ Will, 

so that 

A[||u, - u*||] > Alfa ~ Will] = 6 n (U n ) 

which is absurd. Hence p = 0 and the maximizing vectors belong 
to U n ~ x . This shows that 

(4.3.5) A-6 n [U n ] = A-BJU*- 1 ]. 

POTENTIAL THEORY 

5.0. Orientation. We shall discuss some modern aspects of general 
potential theory in R m y m ^ 1. Let x = (x h x 2y ... y x m ) denote 
a point in R m and write ||x|| for the length of the vector x. 

Classical potential theory is based on the use of certain kernels 
which satisfy Laplace’s equation, log (1/r) for m = 2 and r 2 ~~ m 
for m t* 2. The use of more general kernels goes back to P61ya 
and Szego [69], M. Riesz [70], and Frostman [32] in the early 1930’s. 
The relation between capacity and transfinite diameter goes back to 
Szego [76] in 1924. For the general notion of capacity see C. 
Choquet [15] and L. Carleson [11]. The latter’s paper on excep¬ 
tional sets contains a bibliography of close to 600 items. See also 
E. Cartan [12], J. Deny [18], and K. Kunugui [53]. 

There are five sections: Capacity , Capacitary and Equilibrium 
Potentials f Green’s Function , Some Special Kernels , and A Problem. 
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5.1. Capacity. Let K(r) be a positive nonincreasing continuous 
function defined for r > 0 with lim K ( r ) ^ oo . If a is a measure 
* ri0 

of compact support in R m , the K-potential of a is by definition 

(5.1.1) u(x \a,K)= J K(\\x- y\\) d<r(y), 
and the corresponding energy integral is 

(5.1.2) I(a, K) — f J K(\\x - y\\) do(y) da(x ), 

the integrals taken over the support of a. 

If E is a compact set in R™ and T(E) is the set of all positive 
measures p with total mass 1 concentrated on E, then 

(5.1.3) C k (E) - {inf I(E, p, K ) p 1 , p G r(U), 

is the K-capacity of E. In particular, the cases K(r) = log (l/r) 
and r a give rise to logarithmic capacity and a-capacity, respectively. 

For a given K , there are two possibilities: (i) C K {E) — 0 or 
(ii) 0 < Ck(E) < oo. The problem of characterizing sets E with 
Ck(E) = 0 in terms of other properties has received much attention. 

At this juncture let us recall the definition of Hausdorff measure . 
Let h(t) be a continuous increasing function on [0, 1) with A(0) = 0, 

J 0 [h(t)/t] dt < oo. Let p be given and cover the set E by a finite 

or countably infinite set of spheres S y where the radius of Sj is 
r$ ^ p and set 

H(p) — inf 2 h(rj) 

for all possible coverings of E . This is a nondecreasing function of 
p and 

(5.1.4) m h (E) « lim H(p>) 

Pi o 

is the Hausdorff measure of E with respect to the measuring func¬ 
tion h . It is an outer measure in the sense of Carath^odory and 
hence a countably additive set function on Borel sets. The case 
h(t) = t* gives a-dimensional measure, [log (1 /OP 1 logarithmic 
measure (integrability condition not satisfied!). 

It was proved by Frostman [32] that if a closed set E in the plane 
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has logarithmic capacity 0 then its Hausdorff measure is 0 with 
respect to each admissible measuring function h. He also showed 
that the logarithmic capacity is positive if some Hausdorff measure 
is positive, and this also extends to a-capacities provided the Haus¬ 
dorff measure satisfies the stronger restriction that £ 

L{ 0, 1). In the other direction Carleson [10] gave an example of a 
closed set E of positive logarithmic capacity such that every admis¬ 
sible Hausdorff measure vanishes. He also gave a simple proof of 
a result of Erdos and Gillis [20] according to which a set of finite 
logarithmic measure always is of logarithmic capacity 0. This 
shows that these two ways of measuring sets are fundamentally 
different. 

An interesting limit theorem has recently been proved by Hans 
Wallin [82], Let E be a compact set in R m and let M[E] be its 
m-dimensional Lebesgue measure, C a [E] its a-capacity, a < m. 
Then 

(5.1.5) lim SL&L = kM(E), 

afmWl — a 

where k is a function of m alone. 

Evans [23] proved that if a compact set E in R m has a-capacity 0, 
0 ga a < manda = 0 means logarithmic capacity, then there exists 
a positive measure p on E such that the a-potential of p is infinite 
everywhere on E. Wallin ( loc . cit.) proved that E has a-capacity 0 
if and only if every function / continuous in E coincides in E with a 
continuous a-potential of a measure with compact support. This 
potential is continuous in the whole space and belongs to C* outside 
of E. This result holds also for more general kernels K . Further 
he shows that if / is continuous in E and C a (E) — 0, then / can be 
extended to a function whose partial derivatives of certain orders 
belong to L V (R™ © E) where the orders depend upon a and p. 
See also [81] where the following result is proved. Let U m be the 
open unit sphere in R m , m it 2, and E a closed subset of t/™. Then 
C m — 2 (E) = 0 if and only if every / continuous on E can be extended 
to a function u , harmonic on IT 1 and continuous on its closure, which 
has a finite Dirichlet integral over U and coincides with / in E . 

5.2. Capacitary and Equilibrium Potentials . For questions where 
zero capacity or positive capacity is the only property that matters, 
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formula (5.1.3) which goes back to M. Riesz [70] gives the most con¬ 
venient definition of fC-capacity. But for connections with other 
fields it is better to use the definition of Frostman [32]. To dis¬ 
tinguish between the two varieties, that of Frostman will be denoted 
by Cr*(E) and called the star capacity. It is defined by 

(5.2.1) K[C k *(E)] = inf I(E, M , K) m V K (E), y £ T(E). 

If K(0) = oo, the two definitions are consistent in the sense that 
Ck(E ) = 0 if and only if Cr*(E) = 0. 

Suppose now that C K *(E) > 0. It is then possible to find a 
sequence of positive measures n n £ T(E) such that 

(i) I(E, m„, K) -* V k (E). 

(ii) There is a mo G T(E) such that 

( 5 - 2 - 2 ) J E fix) dn n (x ) -» j E f(x ) dn 0 (x) 

for every / continuous in E. 

(iii) HE, no, K) = V k (E). 

If (5.2.2) holds, {/i»} is said to converge weakly to no ■ If no is 
unique, it is called the equilibrium, distribution on E and u(x | ju 0 , K) 
is called the equilibrium potential. Uniqueness will occur if dE is 
sufficiently regular. In any case u(x | no, K) is a capacitary potential 
and no is a capacitary distribution. 

If mo is unique it can be obtained in the following manner. K is 
strictly monotone for r > 0 and defines an admissible ^-average 
Ak by formula (3.1.2) replacing F by K. Since E is compact we 
can find points x h x 2 , . . . , x n in E such that 

( 5 - 2 -3) A k (\\x s - x*||) = S n (E), 

(5.2A) (l)K[S n m= l K(\\xj — Xk\\). 

X ' l£j<k£n 

At each of the points Xj we place a mass 1/n. This defines a 
M» G T(E) and 

(5.2.5) I{E, Mn, K) = A[5 n (F)] ^ A[5 0 (F)], 


n —> oo. 
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Hence 

(5.2.6) Ak-&q(E) ^ Ck*(E)> 

and it may be shown that equality holds. Szego [76] proved equal¬ 
ity for the logarithmic case, Frostman [32] for K = r ~ a . 

If Ck*(E) — 0, it is no longer necessarily true that the sequence 
{ix n } converges to a limit. For the case K — r~ l Terasaka [77] 
contructed a closed countable set E such that the sequence 

Unix) = f B K(\\x - y\\) dnn(y) 

does not converge for every x in C (E) f that is, ju n does not converge 
weakly. Wallin [81] has observed that such a construction can be 
carried through for every admissible kernel K . 

If v is a capacitary distribution for E, then 

u(x) — u(x | v , K) 

has the following properties: 

(i) u(x) ^ Vk(E) for all x £ except perhaps for a set of 
if-capacity 0. 

(ii) u(x) ^ Vk(E) everywhere on the support of v. 

(iii) u(x) ^ AV k (E) for all x, where the constant A depends 
only on m. 

Here the first two properties follow from the variation prin¬ 
ciple of Gauss, modified by Frostman, whereas the third is due to 
Ugaheri [79]. 

Continuity properties of Ck(E) with respect to K have been 
examined by Wallin [81]. If K n -+K and CkSE) > 0 while 
Ck(E) = 0, and if \x n is a capacitary distribution for E and K n , 
does {} converge weakly to a limit? This turns out to be the case 
if E has positive m-dimensional Lebesgue measure and K n (r)r m ~ l £ 
L(0, 1) while K(r)r m ~ l does not. Various plausible conjectures for 
such problems are disproved by counterexamples. 

5.3. Green’s Function. Let E be a compact set of positive If-capac¬ 
ity in R m and suppose that is the equilibrium distribution of E . 
Let z be exterior to E. The function of Green relative to the set E y 
the point z f and the kernel K is of the form 

(5.3.1) G(x, z; E ) = K(\\z - x||) - f E K(\\y - *||) d yV (y, z). 
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Here the measure v{y , z) is to be chosen so that 

G(z, z) E) ^ 0 

everywhere, with equality in E excepting at most a set of points x of 
capacity zero. 

Since 

J E \ e *(l|t — <||) d a v(s, x) d t p(t, z) 
is a symmetric function of x and z, one obtains that so is G: 

(5.3.2) G(x y z; E) = G(z } x;E). 

The existence of such a function is of basic importance for poten¬ 
tial theory. For if such a function exists, 

f E 0(x, z) dn(x) = 0, n E r (E), 

implies that 

(5.3.3) u(z) = J E u(y) d v v(y, z) 
for any potential function 

(5.3.4) u(x) = j B JC(||* - 2 /||) dn(y). 

The existence of G(x , z ; E) has been proved for special kernels K 
under rather general assumptions on E. For the case of the 
Newtonian potential, K(r) = r” 1 , there is a proof in Frostman [32], 
for the more general kernel r~ a in M. Riesz [70]. We shall devote 
the remainder of this section to the logarithmic case. 

We assume that the compact set E in the complex plane has a 
(unique) equilibrium distribution mo with respect to log (1/r) and 
that V(E) is the minimum value of the energy integral. Then 

(5.3.5) G(z } oo ;E) = V(E) - u(z \ Mo , E) 

is by definition Green's function for mo and C (E), the complement 
It is harmonic in C (E) f tends to zero as z tends to any point 
on 8E excepting a set of capacity zero, and it becomes infinite as 
log |^| when \z\ oo. 

For an arbitrary point z 0 £ C(£) we proceed as follows. The 






transformation 

(5.3.6) 
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1 

w = - 

z — Zo 

maps E into a compact set F in the tu-plane. It takes functions 
harmonic in C (E) © {oo } into functions harmonic in C (F) © {zo}. 
Let mi be the equilibrium distribution on F and let V(F) be the 
minimum of the energy integral. Then 

G(w, oo ; F) = V(F) — u(w | mi, F) 

and we set 

(5.3.7) G(z t z 0 ; E) = G(w, co;F). 

This function evidently has the desired properties and since Greeks 
function is unique when it exists, it follows that this is the correct 
definition. 

Let E be a bounded continuum whose complement C (E) is 
simply connected. Szego had observed that the logarithmic 
capacity of E coincides with the (geometric) transfinite diameter and 
Fekete [27] could prove that the common value agrees with that 
of the exterior mapping radius of E. 

Let F n (z\ E) be the nth Fekete polynomial of E } that is, 

n 

(5.3.8) F n (z) E) = F n (z) = II (* ~ *,\») 
and 

n k« - **.»i = 

i<k 

Set 

(5.3.9) /„(*) = [F n (z)] lln 
with the nth root so chosen that 

lim z~ 1 f n (z) = 1. 

z -» 90 

A basic property of the Fekete polynomials is that 

(5.3.10) lim max |/ n (z)| = $o (E), 

n —>« 

the (geometric) transfinite diameter of E. 
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The functions {/„(z)} are holomorphic in C (E) and 
(5.3.n) lim f n {z) = f( z ) 

n—►» 

exists uniformly on compact subsets of C (E). The limit function 
maps C (E) conformally on 

M > S 0 (E), 

and since f(z ) has an expansion of the form 


(5.3.12) 


f(z) = 2 ^ A k z 


1 -k 


k~ 1 


for large values of |*|, it follows that S 0 (E) equals the exterior con¬ 
formal mapping radius of E as asserted. 

On the other hand, if n n is the distribution of mass on E where 
each point zy,„ gets the mass 1/n, then 


- log |/„(z)| = ^.logp^j d» n {t). 

As n —> oo the left member converges to - log |/(z)| and m„ con- 
verges weakly to the equilibrium distribution on E so that 

(5.3.13) - log |/(z)| = log | j-^| dn 0 (t) = u(z | mo, E), 

the equilibrium potential on E. Here all three members are har¬ 
monic in C(E); the third member approaches V(E) on dE, excepting 
at most a set of capacity 0, whereas the first member approaches 
- log S 0 (E ) almost everywhere on dE. It follows that 

- log S 0 (E) = V(E) = - log C*(E) 


( 5 -3-14) So (E) = C*(E ) 

as asserted above. We see also that 

(5.3.15) G(z, oo ; E) = log |/(z)| — log C*(E). 

In view of the preceding construction Green’s function, G(z, z 0 ; E) 
as well as G(z, =o; E), can be obtained by the same averaging process 
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as gives the transfinite diameter. It is not clear if a similar result 
holds in R z for the Newtonian potential. 

The coefficients A k of (5.3.12) are expressible in terms of the 
moments of the equilibrium distribution 

(5.3.16) M n = j E t n du. 0 (t). 


We have 
(5.3.17) 


/'(«) 

/(*) 



M k z~ k ~\ 


and 

S (_l)i n+p*+---+Pk /MA*' /MM /MA 5 * 

J n i (~) (t) -Ct) ■ 

where the summation extends over those non-negative integers pj 
such that 


Pi + 2p 2 + * * * + kp k = k. 

The expression of .4* in terms of Mj coincides with the Newton- 
Waring formula for the &th symmetric function of n variables, 
n > k, in terms of the corresponding power sums. 

The argument given above goes back to F. Leja [55]. See also 
Hille [37, II, p. 339 et seq.]. The Fekete polynomials F n (z; E) 
can be replaced by the corresponding Cebygev polynomials with the 
same result. In fact, it follows from G. Faber [24] that the mapping 
function is the limit of nth roots of other polynomials associated 
with the compact set E . 

5.4 Some Special Kernels. We return again briefly to m-dimen- 
sional space R m . Here the power kernels have an important com¬ 
position property which becomes more striking if we modify the 
definitions. We set 

(5.4.1) RM = l|x|l “" m ’ 0 < « < 

This is the original definition of M. Riesz [70] who was led to it in an 
effort to generalize the fractional, Riemann-Liouville integral in one 




58 Lectures on Modem Mathematics 


dimension. The composition property is based on the semigroup 
property of the kernel with respect to the parameter 

(5.4.2) R a +e(x) — R a * Rp(x) f 0 < a, 0 < ft, a + < m, 

or written out in full 

(5.4.3) R a +p(x) = J Rm R a (s)Rp(x — s ) ds . 

This kernel gives an a-potential 

(5.4.4) u{x \n,<x) = f E R a (x - y) dp(y) 
with the composition property 

(5.4.5) u(x | n, a + /S) = u[u{x | m, <*) | p, 0], 

If A denotes the Laplacian in R m we have also 

(5.4.6) Am (a: \p, a+ 2) = — u{x\p, a). 

The Riesz potentials suffer from one defect, namely the limitation 
imposed on the parameter a. Riesz has indicated how this handi¬ 
cap can be overcome by analytic continuation with respect to a pro¬ 
vided the mass distribution is sufficiently regular. 

Another device has been used successfully by N. Aronszajn and 
K. T. Smith [6] in a recent study of what they call Bessel potentials. 
These are based on the kernel 

0^(2—m—a) 

Ga(x) = K^^iWxWm^—K 

u z /->(g) - I.{z) Y (hz)’ +2k 

2 sin wr ’ ' * Z/ k\T(v + k + 1)’ 

k = 0 

so that K v (z) is a modified Bessel function of the third kind. At the 
origin G a (x) behaves as R a (x) for a < in but is bounded for a > m 
and it goes exponentially to 0 as \\x\\ —> oo. This implies that the 
Bessel potential can be defined for all a > 0. We have also 

(5.4.8) G a+ fi(x) = G a * G fi (x) = J Rm G a (x — y)Gp(y) dy 

valid for all positive a, 0. 


(5.4.7) 

Here 

K v (z) 
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5.5. A Problem . In conclusion let us make a remark concerning 
a special polynomial series. Let P»(z) denote either F n (z;E) or 
T«(z; E) where E is a compact set in the complex plane, Int {E) is 
connected and 5*0, and C (E) is simply connected. Let 

(5.5.1) 

then the series 

(5.5.2) 

is absolutely convergent in Int (E) and diverges everywhere in 
C(E). Its sum is holomorphic in Int (U). 

Is every function which is holomorphic in Int ( E ) representable 
by a series (5.5.2) and , if so, is the representation unique? 
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Geometry 

H. S . M . Coxeter 


INTRODUCTION 

The subject of geometry is so vast that no adequate account of its 
recent developments could be given in two lectures. Felix Klein, 
David Hilbert, or Wilhelm Blaschke might have been able to give 
such an account in a whole course of lectures; but that would be far 
beyond my powers, because there are so many branches of the sub¬ 
ject in which I am almost as ignorant as the proverbial man in the 
street. I must ask you to forgive me if I concentrate on my own 
favorite branches, and I must take the risk of offending various 
geometers who will ask why I have not dealt with algebraic geome¬ 
try, differential geometry, symplectic geometry, continuous geome¬ 
try, metric spaces [1], Banach spaces, linear programming, and so on. 

EUCLIDEAN GEOMETRY 

For most of us, our first contact with geometry was in the Euclid¬ 
ean plane with little problems about triangles and circles. Many 
such problems seem prohibitively difficult to solve until we happen 
68 
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to think of the right approach. A good example is the Steiner- 
Lehmus theorem, which says that, if two internal angle-bisectors 
of a triangle are equal in length, the triangle must be isosceles. 
This was proposed in 1840 by C. L. Lehmus and proved laboriously 
by Jacob Steiner. Since then about sixty proofs have been pub¬ 
lished, including one by Gilbert and MacDonnell that is surprisingly 
easy [34]. A purist might argue that this is an absolute theorem (not 
involving Euclid’s Fifth Postulate) and should have an absolute 
proof. Such a proof has been given by L. M. Kelly [20, p. 16, 
Ex. 4]. 

Kakeya’s problem [5, pp. 99-101; 6] asks for the plane set of 
least area having the “Kakeya property,” which means that a unit 
segment can be turned continuously in the set through 180° so as to 
return to its original position (reversed). Besicovitch proved that 
there is no least area: such a set with sufficiently many holes (like 
a lace tablecloth) can have as small an area as we please! In view 
of this complete solution, it is natural to modify the problem by 
asking for the smallest convex set, or simply connected set, that has 
the Kakeya property. It was proved by Julius P&l that the smallest 
convex set is the equilateral triangle of unit altitude (area 1/Vo). 
It is tempting to imagine that the smallest simply connected set is 
bounded by the deltoid or three-cusped hypocycloid (area */&), 
within which the unit segment can be moved so as to remain a tan¬ 
gent while both its ends run along the curve. However, a curve 
enclosing a considerably smaller area was discovered independently 
by Melvin Bloom and I. J. Schoenberg in 1963. Schoenberg 
picturesquely describes this star-shaped curve as resembling the 
locus of the end of a Foucault pendulum swinging ten thousand 
times. The area is less than */lL Defining the Kakeya constant 
K to be the greatest lower bound of areas of simply connected 
regions having the Kakeya property, he raises the problem of 
computing K. So far, we know only that 


K ^ 


5-2V2 


24 


7T. 


Fejes T6th regards K as the case n = 2 of the greatest lower 
bound of areas of simply connected regions within which a regular 
n-gon can be continuously turned through 360°/r&- We know only 
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that this lies strictly between the area of the n-gon and the area of 
its circumcircle. 

All three versions of Kakeya’s problem can be extended to three- 
dimensional space, where we seek a set of least volume in which 
the unit segment can be moved from one direction to any other. 
Now even the case of the convex set is unsolved. Analogy suggests 
the regular tetrahedron of unit altitude, but Eggleston remarks 
that a smaller convex region will suffice. 

ORDERED GEOMETRY 

The name ordered geometry has been given by Artin [2, p. 73] to 
the very simple geometry whose only primitive concepts are the 
point and the relation of intermediacy or *' ‘betweenness. 11 If B lies 
between A and C , we write [ABC]. The axioms [20, pp. 177-178] 
tell us that the relation [ABC] implies [CBA] but not [RCA], 
and so on. The segment AB is the set of points X such that [ AXB ], 
the ray A/B is the set of points Y such that [BA Y], and the line AB 
is the union of two points, two rays, and a segment : 

A/B + A + AB + B + B/A . 

Conceived by Pasch in 1882, this geometry was developed by 
Peano (1889), Hilbert (1899) and Veblen [65, Chap. I]. A model 
for it is easily obtained by considering the interior of a convex 
region in real affine or projective space. Conversely, any ordered 
space can be extended to a real projective space (of the same num¬ 
ber of dimensions) by the adjunction of ideal elements such as 
“points at infinity. 11 The best treatment of this extension is by 
Whitehead [62]. He, following Russell [49, p. 382], called ordered 
geometry “descriptive geometry, 11 thus inviting confusion with 
Cayley’s famous dictum “Descriptive geometry is all geometry, 11 
which really meant “Projective geometry is all geometry. 11 

Any geometry worthy of the name has to justify itself by exhibit¬ 
ing interesting theorems. The first such theorem in ordered geome¬ 
try is the Sylvester-Gallai theorem: 

If n points are not all collinear , there is at least one line containing 
exactly two of them . 

This was conjectured by Sylvester in 1893, revived by Karamata 
and Erdos in 1933, and proved (in a complicated manner) by T. 
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Gallai. It is always rather exciting when a problem is solved after 
remaining a challenge for as long as forty years. My own small 
contribution was to adapt Gallai’s Euclidean proof to ordered 
geometry [20, pp. 181-182]. L. M. Kelly [20, pp. 65-66] discovered 
an “absolute” proof, so simple that anyone who has seen it once 
can always remember it. 

The Sylvester-Gallai theorem and its extensions play a significant 
role in the part of ordered geometry that deals with convexity [25], 
Hadwiger [35a] once asked whether every convex polyhedron can 
be transformed, by a finite sequence of truncations (cutting off a 
corner by means of a plane), into a polyhedron such that each face 
has a number of vertices that is a multiple of 3. The answer is yes 
or no according as the Four-Color Conjecture (for maps on a sphere) 
is true or false! This conclusion follows easily from Heawood’s 
congruences [38, pp. 277-278; 5, pp. 230-231], 

In n dimensions, a convex polytope can be defined either as the 
convex hull of a finite set of points or as a bounded set which is the 
intersection of finitely many half-spaces. Supporting hyperplanes 
of an n-dimensional convex polytope n n contain /c-dimensional 
elements 11* for k = 0, 1, . . . , n — 1. These are called vertices, 
edges , . . . , and cells (or facets). It is convenient to include also a 
unique n-dimensional element, the n n itself, and a unique ( — 1)- 
dimensional element n„i, the null set, which belongs to all the other 
elements. If an element II* belongs to an element 11*, we say that 
n* and n* are incident In particular, every edge II i is incident 
with two vertices IIo, and every n n _2 is incident with two cells 
ILn-v More generally, whenever a n*_i and a n* + i are incident, 
there are just two lift’s incident with both (for 0 < k < n — 1). In 
1852 Schlafli proved that, for a polytope having N 0 vertices, N i 
edges, and so on, 

No - n x + n 2 - • • • + (-iyw„ = 1 

or, if you prefer, 

-N-x + No - Ni + • * • + (-1 ) n N n = 0 
[50, p. 190; 22, Chap. IX]. 

A two-dimensional poly tope n 2 is simply a polygon, which has 
equally many vertices and sides (No — N\ = 0). It follows that, 
in any n n , whenever a n *_2 and a 11*^1 are incident, there are equally 
many n*_i’s and lift’s incident with both (for 1 < k < n — 1). 
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If this number q k depends only on k (that is, if it is the same for 
every occurrence of an incident n*_ 2 and n*+i), the polytope is 
said to be (combinatorially) regular, and is denoted by the “Schlafli 
symbol” 

?2, • • • , q n - 1}. 

For instance, [g] is a g-gon, and the ordinary cube has the Schlafli 
symbol {4, 3} because each face has four vertices (and four sides) 
and each vertex belongs to three edges (and three faces). Schlafli’s 
formula (connecting the numbers N k ) is easily verified in the cases 
of the simplex 

(3, 3, ... , 3, 3}, for which N k = ^ 

and the cross polytope (or “n-dimensional octahedron") 

{3,3, . . . ,3,4}, for which 

+ <*<»>■ 

This discussion is the counterpart, in ordered geometry, of the 
metrical definition of a regular polytope [22, p. 288; 30, p. 134] as 
having a center from which all the Ht’s are equidistant, for each 
k < n. 

An ^-dimensional convex polytope is said to be m-neighborly 
if every m vertices lie in a supporting hyperplane. Thus every 
convex polytope is 1-neighborly, and a simplex is m-neighborly for 
eveiy m < n. The possibility of other neighborly polytopes (in 
four or more dimensions) was recognized long ago by Bruckner [8a] 
and Carath6odory [10a]. A 2-neighborly poly tope may be described 
simply as having no diagonals; every two vertices are joined by an 
edge. Carath6odory considered a cyclic poly tope: the convex hull 
of N 0 points (in affine 2m-space) on the rational normal curve which 
is the locus of the point with barycentric coordinates 

(1, t, t\ . . . , <*»), 

David Gale [32] has given a charmingly simple proof that such 
a polytope is m-neighborly and has 



cells (simplexes). There is an unpublished proof by Martin 






Geometry 6$ 


Fieldhouse that every m-neighborly polytope (with No vertices 
in 2 m dimensions) has this number of cells. In 1957 T. S. Motzkin 
conjectured that every m-neighborly polytope is combinatorially 
equivalent to a cyclic polytope. Gale established this conjecture 
for No < 2m + 3. Griinbaum has pointed out that this inequality 
is “best possible,” as one of Bruckner’s polytopes is the dual of a 
noncyclic 2-neighborly polytope having eight vertices (and twenty 
tetrahedral cells). 

When iVn_i is very large, the boundary of an n-dimensional 
poly tope resembles an (n — l)-dimensional tessellation or honey- 
comb: an infinite collection of cells fitting together so as to fill and 
cover the (n — l)-space, each n n _2 belonging to two n n _i’s. 
If the n»_i’s are regular and alike, the honeycomb is said to be 
regular; it evidently has a Schlafli symbol. For instance, {6, 3} is 
the familiar pattern of regular hexagons filling the Euclidean plane. 

SPHERE PACKING 

It seems desirable to say something about the problem of packing 
equal spheres in Euclidean n-space, not only because of its intrinsic 
interest and its connection with the geometry of numbers, but also 
because it has been found to have a practical application in the 
theory of communication. 

Let Af n denote the maximum number of spheres (balls) that can 
touch another of the same size without overlapping , in Euclidean 
n-space. Everyone knows that = 6: seven silver dollars can 
be placed on a table so that one is surrounded by a ring of six. 
Moreover, this pattern can be extended to a packing of infinitely 
many circles (disks), each touching six others, and the centers of all 
the circles are the points of a lattice. One row of the lattice yields 
the analogous arrangement with n = 1. The one-dimensional 
“ball” is a line segment, and in a row of line segments each touches 
two others: M i = 2. 

When n = 3, we are dealing with solid spheres, ordinary balls. 
In 1694 Sir Isaac Newton told I^avid Gregory that M% = 12, but 
Gregory asserted that M z « 13. R. Hoppe, 180 years later, proved 
that Newton s value is correct. (For a fuller account of this story, 
see [21].) Although there is only one way of surrounding a circle 
with six equal circles, there are many ways of surrounding a sphere 
with twelve equal spheres. (This may even be done in such a way 
that no two of the twelve touch each other.) Two of the possible 
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arrangements can be continued systematically so that each sphere 
touches twelve others [41; 13J. In one of these (described by Kepler 
in 1611), the centers form a lattice, namely the face-centered cubic 
lattice [20, pp. 225, 333, 407]. We naturally call this arrangement 
a lattice packing [48a]. 

Let L n denote the number of spheres that touch each one in the 
densest n-dimensional lattice packing. We know that L\ — M\ = 2, 
L 2 = M 2 = 6, L 3 = M 3 = 12, and L n < M n always. Following 
Gauss [33, Vol. 1, p. 307; Vol. 2, p. 192] we can associate with each 
lattice a class of equivalent positive definite quadratic forms. In 
fact, if the lattice is generated by n vectors ei, . . . , e n , its general 
point has the position vector 

2 x J ej, 

where x\ . . . , x n are integers. The square of the length of this 
vector is 

2 x J ej * 2x k e?c = 22 

where a ^ — ey • e*. We are thus led to a positive definite quad¬ 
ratic form; let us denote it by <£. The smallest positive value 
attained by 4> for integers x j is clearly equal to the square of the 
diameter of the largest equal spheres that can be centered at the 
lattice points so as to form a packing (that is, to avoid overlapping). 
Let us assume, for simplicity, that our spheres are of diameter 1, 
so that 1 is the minimal value of <£. Since det (a^) is equal to the 
square of the content of the elementary cell of the lattice [20, p. 
331], the densest packing occurs when this determinant is as small 
as possible. It follows that the search for densest packings is 
equivalent to the search for absolutely extreme forms [15, p. 394]. 

Absolutely extreme forms for n < 8 have been known since 1935. 
They were established for n < 5 by Minkowski [42, p. 247] and 
for n — 6, 7, 8 by Blichfeldt [7]. Forms equivalent to those of 
Minkowski and Blichfeldt may be represented very simply by the 
graphs 
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Each graph is a tree having n nodes, one for each “square* 1 term 
(a; 7 ) 2 , and n — 1 branches, one for each “product” term —x j x k ; 
thus the first three trees represent the forms 

(x 1 ) 2 , (x 1 ) 2 — x J x 2 + (x 2 ) 2 , (x 1 ) 2 — x*x 2 + (x 2 ) 2 — x 2 x 3 + (x 3 ) 2 

[20, p. 337, Exercise, where m 1 = x 1 , w 2 = x 3 , w 3 = —x 2 ]. 

These trees will be recognized by anyone who followed the lec¬ 
tures of Kaplansky [40, pp. 125—126]; he used them as symbols for 
the Lie algebras A h A 2 , A 3 , D 4 , D 6 , E 6 , E 7 , Eg. This connection 
arises from the fact that every densest packing with n < 8 is sym¬ 
metrical by reflection in the common tangent hyperplane at the 
point of contact of any two of the spheres that touch each other. 
Such reflections generate an infinite discrete group whose funda¬ 
mental region is a simplex. When n = 1 the simplex is merely a 
line segment. The simplexes for n = 2, . . . , 8 are conveniently 
represented by graphs 

^ O X X 


.i•■ 

having n + 1 nodes, one for each bounding hyperplane (or mirror). 
Each branch joins two nodes that represent mirrors forming a dihe¬ 
dral angle of 60°. The remaining pairs of mirrors are orthogonal. 

For instance, when n = 2 the packing consists of the incircles of 
the hexagons of the regular tessellation [6, 3} (Figure 1), and the 
group generated by reflections in all the sides of these hexagonsjis 
adequately generated by reflections in the three sides of the shaded 
equilateral triangle. Two of these three sides are orthogonal to 



Figure 1 
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generating vectors of the lattice of centers. (These two vectors 
appear in the figure as broken lines.) Each graph in our second list 
can be derived from the corresponding tree in the first list by adding 
one extra node [15, pp. 405-412]. 



1234321 12345642 

V • • ~ • Y ■ 

Figure 2 


There is an amusingly simple way to derive the values of L n from 
these augmented graphs. We place the number 1 over the extra 
node; then we number the remaining nodes as in Figure 2, so that 
the number on each is half the sum of the numbers on the adjacent 
nodes. (In particular, the numbers along each “leg” are in arith¬ 
metic progression.) Steinberg [54; see also 22, p. 212] has proved 
that L n is equal to n times the sum of these numbers: 

n 1 2 3 4 5 6 7 8 

Ln 2 6 12 24 40 72 126 240 

C. Muses (in a letter of December 1963) epitomized these values in 
the simple formula 


L n - n 



+ n + 1 


) 


(1 < n < 8). 


Although L n is unknown for n > 8, we do know that the value 
given by this simple formula (for example, 468 for L 9 ) is too large, 
and that the densest lattice packing is no longer symmetrical by 
reflection in all the common tangent hyperplanes. It has been 
conjectured that L 9 = 272. In 1946 T. W. Chaundy believed he 
had a proof, but he withdrew his paper when he realized he had 
made the unjustifiable assumption that the densest (» — 1)- 
dimensional packing must occur as a section of the densest n-dimen¬ 
sional packing. 

Our knowledge of the numbers M n is even less satisfactory. 
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We know that M n — L n for 1 < n < 3, and we believe that 
M± — 24; but it is conceivable that M 4 is 25 or 26. Obviously 
L n provides a lower bound for M n . An upper bound can be derived 
from Schlafli’s work on the content of a simplex in spherical space 
(the analogue of the area of a spherical triangle) [50, p. 234; 21]. 
The conclusion is that 


M n < 


'2fn- 
- /«( 


^(n) J 

where / n (x) is defined recursively by /o(x) = fi(x) = 1, 

/ n _ 2 (x - 2) 


/»(*) 


-'-f 

* Jn 


~i x V7-T 


dx. 


With the aid of an electronic computer, John Leech obtained 
the following values: 

n 4 5 6 7 8 9 10 11 12 

Bound for M n 26 48 85 146 244 401 648 1035 1637 

The most remarkable of these is the case when n = 8, the range of 
possibilities being narrowed to 

| 240 < Ms < 244. 

The next case explains why we can be sure that Lg 5 ^ 468. For 
large n, the upper bound is asymptotically 

— ft*2<*-*>' 2 . 

e 


INTEGRAL QUATERNIONS AND INTEGRAL OCTAVES 

The regularity of the arrangement when n = 1 or 2, and the 
near-regularity when n = 4 or 8 , are connected with the rings of 
rational integers, Eisenstein integers (a + bp where p = 6 2ir * /3 ), 
integral quaternions, and integral octaves. This application of 
geometry to the theory of numbers is, perhaps, worthy of a little 
more attention. Consider, for instance, the representation of the 
quaternion 

x *= x 0 + xi i + X 5 \j + xzk 

(where i 2 = j 2 = k 2 = ijk = — 1 ) by the point (xo, x \, x 2 , £ 3 ) 








68 Lectures on Modem Mathematics 


in Euclidean four-space, or by the vector x that takes the origin to 
this point. In Hamilton’s notation, the conjugate of x is 

x « xq — xii — X 2 j — xzk 

and the scalar part of x is Sx = xo = £(x + x). The norm 

Nx = xx = £x = x 0 2 + xi 2 + xt + xz 2 

appears as the square of the length of the vector x. Multiplication 
is not always commutative. The conjugate of the product xy is 
easily seen to be yx\ therefore the conjugate of xy is yx . The inner 
product of two vectors x and y is equal to the scalar part of the 
quaternion product xy: 

x • y - xoyo + x±yi + x 2 2/2 + %zy$ — S(xy) 

= i(xy + yx). 

If a and b are two unit vectors, so that N a = Nfe — 1, we have 

N(a + b) = (a + b)(d + b ) = ad + bb + ab + bd 

= Na + N6 + 2a • b — 2 + 2 cos (ab), 

where (ab) is the angle between the two vectors. N(o + b) takes 
the value 1 when (ab) “ 120°, and 2 when (ab) = 90°. 

Consider the reflection in the hyperplane through the origin 
orthogonal to the unit vector a. Since this reflection transforms any 
vector x into x — 2 (x • a)a, it transforms the corresponding qua¬ 
ternion x into 

x — (xa + ax)a — ~axa 

[64, p. 308], 

Defining g = £(1 + % + j k), we can associate the quaternions 
1, ig, jg, kg with the nodes of the graph 



Jg kg 


in such a way that two of them, say a and 6, satisfy 
N(a + b) — 1 or 2 

according as the nodes are adjacent or nonadjacent. These four 
quaternions generate, by addition and subtraction, the ring of 
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integral quaternions x whose constituents xo, xi, x 2 , £3 are either 
integers or (all of them) halves of odd integers. Also ig and jg , 
satisfying 

(ig ) 3 = 1 , ( ig)(jg)(ig) = (jg)(w)(jg), 

generate by multiplication the 24 unities 

±1, ±i, ±j, ±k , i(±l ±i±j±k), 

which constitute the binary tetrahedral group of order 24 [24 (6.69)]. 
Finally, the four reflections 

x —> — ax a, 

where a = 1 or ig or jg or kg, generate the group [3 1 * 1 * 1 ] of order 192 
[22, p. 200], which shows that the points representing the 24 unities 
are the vertices of the regular 24-cell 0m or {3,4,3}, and that 
the lattice of points representing all the integral quaternions are 
the vertices of the regular four-dimensional honeycomb lm or 
{3, 3, 4, 3}, whose cells are 16-cells (or cross polytopes) In or 
{3, 3, 4} [27a, p. 81]. 

This four-dimensional representation of integral quaternions has 
analogues both below (in two dimensions) and above (in eight 
dimensions). In the subgraph 

1 ig 

we can replace the quaternion ig (of period 3) by the complex 
number p = e 2rl/3 . The two complex numbers 1 and p are a basis 
for the ring of Eisenstein integers, which is represented in the 
obvious manner by the lattice of vertices of the regular tessellation 
{3, 6} of equilateral triangles. Also — p generates the cyclic group 
of six unities 

±1, ±p, ±p 2 

represented by a regular hexagon {6}. The hexagon is symmetrical 
by the reflections 

x —> —axa = —a 2 x, 

where a = 1 or p, These two reflections generate the dihedral 
group [3], of order 6. 

Octaves (often called Cayley numbers, or Cayley-Graves numbers, 
or the Cayley-Dickson algebra) were discovered by Graves and 
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Cayley, and named by Hamilton. In Dickson’s notation, the 
octave q + Qe is derived from quaternions q and Q by adjoining a 
new unit e which enters into (nonassociative) multiplication accord¬ 
ing to the rule 

(g + Qe)(r + Re) = qr — RQ + (Rq + Qr)e, 

where the bars indicate quaternion conjugates. 

The conjugate and norm of the octave 

x = xq + x\i + x*ij + £3 k + x 4 e + x&e + x$je + x 7 ke 

are defined by 

x — Xo — x\i — X 2 J — x$k — x 4 e — x&e — x^je — x 7 ke 

and 

Na; = xx = x 0 2 + x 1 2 + x 2 2 + x 3 2 + x t 2 + x- 2 + x 6 2 + x 7 2 . 
Since 

q + Qe ~ q — Qe = q — eQ =* q + eQ 

and N (q + Qe) = (q + Qe)(q - Qe) * qq + $Q = N<? + N Q } there 
is no conflict between our two uses of the bar and the N. Since 
xo — v(x + x), the octave x (like the quaternion x) satisfies the 
quadratic equation 

x 2 — 2xox + Nx = 0. 

A ring of integral octaves can be selected in seven ways. One 
way uses the basis 


1, j y e, ke , h, ih , jh } eh , 

where h = + j + k + e) [12, p. 569]. Our rule about N(a + b) 

applies again to the graph 



Jh 


We represent the octave 

xo + X\i + x 2 j + Xzk + x 4 e + x$ie + x 6 je + x 7 ke 

by the vector or point (xo, Xj, . . . , X 7 ) in Euclidean eight-space. 
The reflection that reverses a unit vector a is still given by 
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(Although multiplication is not universally associative, we do have 
ab • a = a - ba, so that such an expression as aba is unambiguous.) 

The four octaves i, j, e, h generate by multiplication the loop of 
240 unities 


±1, ±i, ±i> ±je, ±ke, 

i(± 1 ±ie+je±ke), £(±f +ji k + e), 

i(±l ±j±k±ie), i(±i±e±je±ke) f etc. 

(where “etc.” means that we may cyclically permute i, j , k). 
Finally, the eight reflections 

x —> — axa, 

where a takes the values 1, j, e , ke, h , ih, jh y eh, generate the group 
[3 4,2,1 ] of order 192 * 10! [22, p. 204], which shows that the points 
representing the 240 unities are the vertices of the semiregular 
poly tope 4 2 i. (These numbers 4, 2, 1 indicate the lengths of the 
three legs of the graph.) Also the lattice of points representing all 
the integral octaves are the vertices of the eight-dimensional 
honeycomb 5 2 i, whose cells are simplexes 5 2 o and cross polytopes 5n. 

Since a cross polytope of edge 1 has circumradius y/\ while a 
simplex has a smaller circumradius, every point in the space is 
within a distance y/? of some vertex of the honeycomb. It follows 
that, for every octave £, there is an integral octave x such that 

N(£ - x) < i 

[12, p. 577]. Similar considerations yield the same result for a 
quaternion £ and an integral quaternion x , and for a complex num¬ 
ber £ and a Gaussian integer x . However, since the circumradius 
of an equilateral triangle is only yj i, for every complex number £ 
there is an Eisenstein integer x such that N(£ — x) < (Here 
we are using the word “norm” in Hamilton's sense. Workers in 
Hilbert space use the same word for |£ — x\, the square root of 
Hamilton’s norm.) It is possible that this geometric approach to 
algebraic numbers could be further exploited. 

PROJECTIVE GEOMETRY 

In projective geometry the primitive concepts are the point, 
the line, and the relation of incidence. The axioms [23, p. 15] tell 
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A 



us that if AB meets CD (and the four points are all distinct) then 
AC meets BD, and so on. If C is not on AB, we can define the 
plane ABC as consisting of all the points on all the lines through C 
that meet AB, and all the lines that join pairs of these points. If 
we include an axiom saying that, for any plane ABC, there is a 
point D not in this plane (so that the number of dimensions is 
greater than 2), we can prove Desargues’s theorem about perspec¬ 
tive triangles and deduce the uniqueness of the fourth point of a 
harmonic set. If, on the other hand, we assume that all points lie 
in one place (so that the number of dimensions is 2), we are free to 
assert or deny Desargues’s theorem, that is, to work in a Desar - 
guesian plane or a non-Desarguesian plane, respectively. A plane 
may be non-Desarguesian and still admit a unique fourth harmonic 
point for any three given collinear points; it is then called a Moufang 
plane [36, p. 370; 46, p. 1371. 

Another famous theorem that may be either taken as an axiom 
or denied is Pappus’s theorem (Figure 3) which says, of the lines 
joining six coplanar points, that if AB, CD, EF are concurrent and 
DE, FA, BC are concurrent, then also AD, BE, CF are concurrent 
[23, p. 90]. (G. Hessenberg’s proof that Pappus implies Desargues 

has been published in many versions [e.g., 16, pp. 1, 46], but almost 
every author disregards at least one exceptional situation. Appar¬ 
ently the first complete proof is that of Pickert [47, pp. 144-148].) 
Since it is possible to deny Pappus without denying Desargues, 
geometries without Pappus can occur in any number of dimensions. 

Pappus’s theorem is equivalent to the Fundamental Theorem, 
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which says that a projectivity is determined by its effect on three 
collinear points. This plays an essential role in the classical treat¬ 
ment of polarities and conics [23, pp. 41-89]. In a projective plane 
satisfying Pappus’s theorem, a polarity is an involutory correspond¬ 
ence between points and lines, such that each range corresponds to a 
projectively related pencil. A point A (or B) and the correspond¬ 
ing line a (or b) are called pole and polar. If B lies on a, then A lies 
on b y A and B are called conjugate points } a and b are called con¬ 
jugate lines . If a self-conjugate point exists, it lies on a conic whose 
points and tangents are precisely all the self-conjugate points and 
self-conjugate lines. Other points are said to be exterior or interior 
according as they lie on two tangents or on no tangent. The polars 
of such points are secants or nonsecants , respectively. On a secant 
ABy any pair of conjugate points is a pair of harmonic conjugates 
with respect to A and B. (These definitions and deductions, which 
are commonplace for the real projective plane, remain valid for the 
general Desarguesian plane, and can be adapted to non-Desar- 
guesian planes [4].) 

In the presence of a given polarity, a point Q is said to be accessible 
from a point P if Q is the harmonic conjugate of P with respect to 
some pair of distinct points which are conjugate for the polarity. 
The relation of accessibility is evidently reflexive and symmetric. 
By an application of Hesse’s theorem, it is also transitive [23, pp. 
101, 148], and the points of the plane can be distributed into one or 
more classes of mutually accessible points. 

If the polarity admits self-con jugate points, their locus (the conic) 
constitutes a single class, and the exterior points constitute another 
single class. Thus the exterior points on a secant form a segment 
according to the ingenious definition of Pieri [16, p. 38]. Over some 
fields (such as real, rational, or finite) there are also interior points 
(lying on no tangent). We shall see later on that, in any real or 
finite plane, the interior points constitute a single class; but in the 
rational plane the interior points are distributed among infinitely 
many classes. 

One of the most elegant developments in projective geometry 
has been the synthetic coordinatization of the general projective 
plane. This began in 1847 with von Staudt’s calculus of 1 ‘throws,” 
and continued with Hilbert’s remark that, in any Desarguesian 
plane, suitable definitions of addition and multiplication exhibit 
the points of a range as a division ring or “skew field” or corpus , 
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which becomes a (commutative) field when Pappus’s theorem holds. 
Further investigation by Oswald Veblen and J. H. M. Wedderbum 
inspired Marshall Hall [36, p. 123] to invent the ternary ring which 
enabled him to coordinatize a non-Desarguesian plane. This 
exciting story has been told many times, for example, by Pedoe 
[46, Chap. VI]. 

The most interesting case is that of the Moufang plane, in which 
the ternary ring reduces to an alternative division ring: addition is 
an Abelian group, both distributive laws hold, and each a t* 0 has 
an inverse a~ l satisfying 

a 1 a = aa~ 1 = 1, a~ 1 (ab) — b = (ba)a~ 1 , 

whence a(ab) = ( aa)b , ( ba)a = b(aa ). It has been proved by Bruck 
and Kleinfeld [9] that every nonassociative ring of this kind is 
simply the ring of octaves over a field. (Earlier, we tacitly assumed 
this field to consist of the real numbers.) This aspect of the Mou¬ 
fang plane has been developed intensively by Freudenthal [31] and 
van der Blij [8]. 

In the algebraic approach to n-dimensional projective geometry 
over an arbitrary corpus [53, Part II], a point (x) and a hyperplane 
[X] are defined to be sets of “equivalent” symbols 

(xi, ♦ . . , £ n +i) and [Xi, . . . , X n +i], 

where the x s and X’s are not all zero, the rules for equivalence are 
that (®i, . . . , s n +i) is equivalent to (xiX, . . . , x n +i\) (X ^ 0) 
and [Zi, . • • , X n +i] to [mXi, . . . , yX n+ i] (/z 0), and the 
condition for incidence is {Xx) = 0, where 

{Xx} = XiXi + * * • + X n+ i£ n _|_i. 

It follows that the points collinear with (x) and (y) are (a:X + 
meaning (x{k + z/i/x, . . . , ^n+iX + y n +xn). In particular, any 
three distinct collinear points may be expressed as 

(x)> ( V ), (x + 2/). 

If a fourth point on the same line is (x\ + y) } the number X is 
called the cross ratio of the four collinear points [46, p. 170]. Dually 
y is the cross ratio of four “coaxial” hyperplanes 

[X], [Y], [X + F], [yX + Y], 
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If the four points are sections of the four hyperplanes, we have 
{Xx} =0, [Yy] =0, {Xy} + {Yx} = 0 

and 

fi{Xy} + {7x}X = 0, whence tx{Xy] = {Xy}\. 

Thus we can conclude X = p if and only if the corpus is a field. In 
other words, the commutative law is logically equivalent to the 
invariance of cross ratio under projectivities, to the fundamental 
theorem of projective geometry, to Pappus’s theorem [23, p. 38], 
to Mobius’s theorem about mutually inscribed tetrahedra [20, p. 
258] and to Gallucci’s eight-line theorem [14, pp. 444-445]: 

If three skew lines all meet three other skew lines , any transversal 
to the first set of three meets any transversal of the second set. 

In the commutative case, let {AB, CD} denote the cross ratio X 
of four collinear points A, B, C, D, so that, if A is (x) and B is ( y ) 
and C is (x + y), then D is (xX + y), which is now the same as 
(Xx + y). It follows easily [46, p. 171] that 


{AB,DC}=\~ 1 and {AC, BD) *1-X. 

Also, if C and D lie in hyperplanes a and fi, respectively, we write 
{AB, CD} = {AB,aP\, {AC, BD} = {Aa, Bp}, and so on. If 
a is [X] and jS is [Y], so that (x + y) lies on [X], and (xX + y) on 
[7], we have {Xx} + {Xy} - 0 and {7x}X + {Yy} = 0, whence 


{AB, af}} 


{Xx}{Yy} 

{Yx}{Xy} 


[39, pp. 120, 136]. 

According to a famous theorem of Wedderburn, every finite 
corpus is commutative (that is, every finite corpus is a Galois field). 
All known proofs, such as that of Ernst Witt [63], are deeply 
algebraic. Geometers are challenged by the thought that there 
may be a nonalgebraic proof of the equivalent statement that 
Pappus’s theorem (or Mobius’s or Gallucci’s) holds in every finite 
Desarguesian plane. Such a thought was doubtless the motivation 
for the large section of Segre’s book [53] that is devoted to line 
geometry over a general corpus. 

If a, b , c are three skew lines, the regulus abc has directrices which 
are their transversals, and generators which are transversals of 
triads of directrices. If the corpus is not a field, Gallucci s theorem 
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fails, and so there are more generators than directrices. Such 
considerations provide a basis for the noncommutative theory of 
conics , which can be defined as sections of reguli. 

One of the earliest instances of a non-Desarguesian plane was 
invented by Moulton [43]. However, that has an unpleasant air of 
artificiality. Far more satisfying is the example of Segre [51, p. 39], 
where the points and lines of a non-Desarguesian projective plane 
are represented by the points of the real projective plane and a cer¬ 
tain two-parameter family of cubic curves, namely 

(^l 2 + ^2 2 + X Z 2 )(XiXi + X 2 X 2 + X 3 Z 3 )0n 2 + X 2 + Xz) 

+ XX 3 3 z 3 3 - 0. 

where the X’s are homogeneous parameters, varying from curve 
to curve, and X is a sufficiently small constant. 

Before 1892, coordinates were always either real or complex. 
Then Fano [29] described an n-dimensional geometry in which the 
coordinates belong to the field of residue-classes modulo a prime p , 
so that the number of points on a line is p + 1. In 1906 Veblen 
and Bussey [61] gave this finite Projective Geometry the name 
PG(n , p) and extended it to PG(n y q) where the coordinates belong 
to the Galois field GF(q) ) q being any power of a prime. Many of 
the properties of this geometry depend only on the number q + 1 of 
points on a line. For instance [53, p. 27], only the most basic 
axioms of incidence are needed to prove that the number of &-dimen- 
sional subspaces £& in the n-space S n is 

( Q n+l - 1 )(q n - 1) . . . (q n + l ~ k - 1) 

- 1 )(ff* - 1) ... (q ~ 1) 

= (q n+l - 1 )(q n -!)■■■ (g* +2 - 1) 

- l)(q n ~ k ~ l - 1) . . . (q - 1) 

The former expression is appropriate when k < — 1), the 

latter when k > — 1). In particular, the finite plane contains 

q + q + 1 points and (of course) the same number of lines. 

In 1901 Dickson [26] represented many finite groups, especially 
simple groups, in finite projective spaces. About 1940 Brauer 
mentioned the desirability of finding a geometric representation for 
the Mathieu groups M lh M 12 , M 22 , M 23 , M 24 . This remark 
inspired my work [18] which was continued by Todd [60], with the 
conclusion that the quintuply transitive groups M\ 2 and M 2 a can 
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be represented as collineation groups in PG(5, 3) and PG( 11, 2), 
respectively. 


ELLIPTIC AND HYPERBOLIC POLARITIES 


In the projective plane over a (commutative) field, a polarity is 
given by a bilinear form 

(xy) = 22 CjkXjVk, Cj k = c k j, det (cj k ) ^ 0. 

Two points (x) and (y), meaning tei, x 2 , x 3 ) and (yi, y% yt), are 
conjugate if and only if (xy) = 0. Since the harmonic conjugates 

(x ± ny) = tei ± nyi, x 2 ± m, x i ± yyi) (m ^ o) 

are conjugate (for the polarity) if 

(xx) = n 2 (yy), 


this relation (for some nonzero square n 2 ) is the condition for (x) 
to be accessible from (y) [23, pp. 124, 153]. 

If the quadratic form (xx) is indefinite, self-conjugate points 
exist, and their locus is the conic 

tea:) = 0. 


Multiplying the cjk (if necessary) by a suitable constant, we can 
ensure that, for some particular exterior point (y), (yy) is a square. 
Then, since the exterior points constitute a single class of mutually 
accessible points, (a:a;) is a nonzero square for every exterior point 
(a:) and a nonsquare for every interior point (a;) [23, pp. 126, 155; 
45, p. 201]. For instance, if 


(xy) = xiy i + x 2 y 2 — x 3 y3, 


so that (a;a;) = aq 2 + x 2 2 — x 3 2 , the conic (xx) — 0 has tangents 
x 2 i x 3 = 0, and we can take (y) to be their point of intersection 


( 1 , 0 , 0 ). 

Over the real .field, the statement “(xx) is a nonzero square 


)! 


becomes simply “(xx) >0.” 

Over the complex field, every number is a square, so there are no 
interior points. Over the real field or any finite field, the product 
(or quotient) of two nonsquares is a square, and so the interior 
points constitute cl single class. Over the rational field, we can find 
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points ( x) for which ( xx ) takes in turn various prime values; such 
points are interior but inaccessible from one another. 

The same distinction between real and rational fields occurs when 
the polarity has no self-conjugate points, that is, when all the points 
of the plane are “interior.” If such an “elliptic” polarity could 
exist over a finite field (of odd characteristic), that is, in a finite 
plane PG( 2, q) (q odd), the number of points in each class of 
accessible points would be i(q 2 + 1), which is not a divisor of 
a + ? + 1> the number of points in the whole plane [3, pp. 123- 
124; 23, pp. 101, 149]. Therefore finite geometries of two (or more) 
dimensions admit no elliptic polarities [2, p. 144; 52, p. 4], 

This result is equivalent to the case n = 3 of the following 
theorem due to Chevalley [11]: 

Every Galois field is quasi-algebraically closed. 

This means that, over a finite field, if F(xi t . . . , x n ) is any 
homogeneous polynomial of degree less than n, the equation F = 0 
has a nontrivial solution. 

CONICS IN THE REAL PLANE 

The theory of accessibility shows that, in the real projective 
plane, two points on a conic decompose the secant joining them into 
two segments, consisting of exterior and interior points, respectively. 
The interior segment is naturally called a chord. Figure 4 shows 
two points A and B on a chord MN, so arranged that A and M 
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separate B and N, The polars of A and B, say a and b, join L, 
the pole of the line MN, to points A' and B' on this line. These 
last points are the harmonic conjugates of A and B with respect to 
M and N [23, p. 73]. Hence, if M, N, A, B are 

(x), ( y ), (x + y), (Xas + y), 

respectively, then A' is (—x + y) and B' is (—Xx + y). Since 
the cross ratio of (<cx + y), (Xx + y), (mx + y), (vx + y) is 


(k ~ m)(X ~ v) 
(k — v) (X — m) 


[23, pp. 119, 152], it follows that 
{AB,ba\ = {AB,B'A'\ = 


(1 + X)(X + 1) = (X + l) 2 
(1 + 1)(X + X) 4X 


where X = {MN, AB) = { AB , MN). This result extends easily 
to real projective n-space, with a nonruled quadric instead of a 
conic: if a and /3 are the polar hyperplanes of two points A and B 
on a chord MN, then 


{AB,p a } = 


(X + 1)^ 
4X 


where X = { AB, MN }. 


CONICS AND Jfc-ARCS IN A FINITE PLANE 

In any finite projective plane, a set of k points, no three collinear, 
is called a k-arc. A fc-arc is said to be complete if it is not part of 
a (k + l)-arc. Segre [53, pp. 270-294] has proved that, in a 
Desarguesian plane PG{ 2, q) where q is odd, every (q + l)-arc 
is complete, the (q + l)-arcs are just the conics, and there are no 
complete g-arcs. But complete (g — l)-arcs occur when g = 7, 
9, 11, 13; also complete 6-arcs and 7-arcs occur when g = 9. 

’ In a plane PG(2,q) where g is even (namely a power of 2), the 
g + 1 tangents of a (g + l)-arc conbur in a point called the nucleus 
(or “center”) of the (g + l)-arc. The (g + l)-arc is incomplete, 
but yields a complete (g + 2)-arc when its nucleus is added. Each 
point of the (g + 2)-arc is the nucleus of the (g + l)-arc formed by 
the remaining points, but these g + 1 points usually do not form a 
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conic! It has been proved with the aid of an electronic computer 
that, when q = 16, some (q + 2)-arcs contain no conic. The same 
situation arises when q = 32 or 128 or any higher power of 2, but 
the case when q = 64 remains undecided. 

In virtue of Wedderburn’s theorem, the only possible coordi¬ 
nates for a finite non-Desarguesian plane belong to a ternary ring 
which is not a corpus. Still denoting the number of points on a 
line by q + 1, we naturally ask what are the possible values for q. 
Although some progress has been made, this remains a challenging 
problem. The number q is called the order of the plane. 

The painstaking work of Tarry [59] on Euler’s Officers’ Problem 
[5, p. 190] proves the impossibility of q = 6. It is known [37] 
that the only planes of other orders less than 9 are the Desarguesian 
planes PG(2, q ); but non-Desarguesian planes are known for q — 9 
and many greater values, including p r with p odd and r > 2. In 
every known case, q is a power of a prime [36, p. 394], Bruck and 
Ryser [10] proved that there cannot be a plane with q m 1 or 2 
(mod 4) unless q is expressible in the form a 2 + 6 2 . Since 10 is so 
expressible, but 6 not, this result includes Tarry's but leaves open 
the possibility of q = 10. 

In PG(2, q), where $ is a power of 2, every complete quadrangle 
has collinear diagonal points. Conversely, this property for every 
complete quadrangle in a finite plane is sufficient to make the plane 
Desarguesian [35]. In PG{ 2, q), where q is odd, every complete 
quadrangle has noncollinear diagonal points. Hanna Neumann 
[44] has conjectured that this property for every complete quad¬ 
rangle suffices to make the finite plane Desarguesian. 

HYPERBOLIC GEOMETRY 

When an ordered space is extended to a real projective space 
(that is, imbedded in a real projective space), each ordered line is 
extended to a projective line (on which the order of points is not 
serial but cyclic). On such a line there is either a single ideal point 
or an interval of ideal points. If every line has a single ideal point 
(its point at infinity) the ordered space is affine [20, Chap. 13]: 
through any point there is a unique line parallel to a given line, and 
we can compare distances along parallel lines. If every line has an 
interval of ideal points, the ordered space is hyperbolic, the ends of 
the ideal segments are called points at infinity (or simply ends), 
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and their locus (in the projective space) is a nonruled quadric 
[17, p. 195] or, in two-dimensional case, a conic. This is Klein's 
approach to non-Euclidean geometry: the points and lines are the 
interior points and chords of a nonruled quadric (or conic). 

When the points and lines of the hyperbolic plane are regarded 
as the interior points and chords of a conic in the real projective 
plane, the isometries of the former appear in the latter as collinea- 
tions leaving the “absolute” conic invariant [17, pp. 201-203]. 
In the projective plane, a collineation of period 2 is a harmonic 
homology [23, p. 55], and this leaves a conic invariant if the center 
and axis are pole and polar. When applied to the absolute conic, 
this collineation appears as a reflection or a half-turn according to 
the nature of its axis. If the axis is a secant (so that its pole is 
exterior) the homology is the reflection in this line. If the axis is a 
nonsecant, so that its pole is interior, the homology is the half-turn 
about this point. It follows that perpendicular lines are conjugate 
with respect to the absolute conic. The various lines perpendicular 
to a given line are ultraparallel [55] to one another: their common 
point, being the pole of the given line, is exterior, and thus does not 
belong to the hyperbolic plane itself. 

Any two ends, M and N, determine a line l = MN whose pole L 
is the intersection of the tangents at M and N (Figure 5). For any 
point P, not on Z, the segments PM and PN are the rays from P 
parallel to l (in the sense of Gauss, Bolyai, and Lobachevsky): they 
do not meet l (in ordinary points), but every ray within the angle 
NPM does meet l. This angle is bisected by the line PL, which, 
being perpendicular to Z, acts as a mirror reflecting PM into P N. 
This angle-bisecting technique, combined with continuity, soon 
yields a measurement of angles. In particular, ZLPM is the angle 
of parallelism for the point P and line Z. 
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The measurement of distance can be developed from the observa- 
tion that two directed segments, AB and A'B', on chords MN and 
M'N' of the absolute conic, are congruent if ABMN -r A'B'M'N' 
that is, if ’ 

{AB, MN] = {A'B',M'N'\. 

The essential property of a directed distance is its additivity: if 
[ABC], then 

AB + BC = AC. 

Taking into consideration the relation 


[AB,MN]{BC,MN] = {AC, MN] 

[17, p. 77], which holds for any five collinear points, we see that this 
requirement is neatly satisfied by defining 

AB = £log {AB,MN}. 

(The i is inserted for convenience [17, p. 201].) Now let a and b 
be the absolute polars of A and B, as in Figure 4. Setting X = e 2AB 
in our formula 


{AB, ba] =^T^’ 
4A 


we deduce 


V{AB,ba] = 


+ 1 

2e AB 


where X = {AB, MN], 

e AB + e ~ AB 

= ---= cosh AB. 


Thus the distance AB is given by 

cosh AB = V{AB, ba], sech AB = y/{AB, ab], 
tanh AB = y/{Aa, Bb], 

Similarly in n dimensions, if a and /3 are the absolute polar hyper- 
planes of A and B } 

cosh AB = -\/{AB, /3a] and tanh AB = y/{Aa, .6/3}. 

In terms of projective (homogeneous) coordinates, let the abso¬ 
lute quadric have the equation ( xx) = 0, where 

(xy) = 2'2c ]k xjy k , c jk = c k} , det (c jk ) ^ 0. 
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Then the polar hyperplane of the point ( y ) is [F], where 

Yj = 2cj k y k . 


Let A, B, a, /3 be (x), (y), [X], [Y], so that 


Then 


{Yx j = ZYjXj = (xy). 


cosh 2 AB = {AB, /3a] = 


\Yx\[Xy] = (xy) 2 

{Xx}\Yy\ (xx) (yy)' 


Somewhat similar considerations [17, p. 225] yield the formula 


cos 2 (a/3) = 


(xy) 2 
(xx) (yy) 


for the angle (a/3) between two intersecting hyperplanes, a and /3, 
whose poles are exterior points (x) and (y) on a nonsecant. 


EXTERIOR-HYPERBOLIC GEOMETRY 


As long ago as 1907, Study [55] described the geometry of the 
points exterior to a nonruled quadric (xx) = 0 in real projective 
n-space, that is, of the poles of the hyperplanes in hyperbolic 
n-space. He observed that the join of two such points may be 
either a secant or a tangent or a nonsecant. In the case of the 
secant, we can simply interchange the roles of the primed and 
unprimed letters in Figure 4, and conclude, as before, that 


cosh 2 AB = 


(xy) 2 

(xx)(yy) 


In the case of the nonsecant, we borrow a trick from spherical 
trigonometry and measure the distance AB by the angle (a/3) 
between the polar hyperplanes, so that 


cos 2 AB 


(xy) 2 
(xx) (yy) 


In the intermediate case (when the line AB is a tangent), we have 

(xy) 2 = (xx)(yy) 

[23, pp. 126, 154, Ex. 6]; so it is natural to define the length of AB 
to be zero. 
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Dividing the coefficients c jk by the constant (yy), where (y) is a 
particular exterior point, we can contrive (as in our section on 
“Elliptic and Hyperbolic Polarities”) to make ( xx ) positive for 
every exterior point. Then, dividing the coordinates of each point 
(z) by V ( xx )> we can make (xx) = 1 for every exterior point [17, 
p. 281J. In other words, we replace our homogeneous coordinates 
by redundant nonhomogeneous coordinates satisfying (xx) = 1 , 
with the pleasing result that 

cosh AB = |(®y)|, AB = 0, or cos AB = \(xy)\ 

according as the line AB is a secant, a tangent, or a nonsecant. 

For instance, as we have already seen, ( 1 , 0 , 0 ) is exterior to 
the conic 

Xx + z 2 2 — X3 2 = 0, 

so we can assume x\ + x 2 2 — X 3 2 = 1 for every point (x) in the 
exterior-hyperbolic plane determined by this conic. Study observed 
[55, p. 108] that in a triangle whose sides are lines of the first kind 
(secants), one side is longer than the sum of the other two! 

RELATIVITY 

In its geometric aspect, the space-time of relativity theory is a 
diagram consisting of a four-dimensional space whose points repre¬ 
sent events. As Synge remarks [ 57 , pp. 5 , 6 ]: “Anything that hap¬ 
pens is an event, but we sharpen the concept to mean an occurrence 
that takes up no room and has no duration. . . . All possible 
events form a four-dimensional continuum. ” The locus of events in 
the “life” of any particular particle, being one-dimensional, is a 
curve in the four-space. Following Minkowski [42a, p. 55 ], we call 
this curve the world line of the particle. Synge’s most charming 
and revealing description of world lines was given in a short paper 
in the New Scientist [56], 

The passage of time appears in the diagram as a direction along 
the world line of each particle: thus the world line is a directed curve. 
Each event in the history of the particle appears as a point decom¬ 
posing the curve into two parts: the past and the future. In par¬ 
ticular, the world line of an unaccelerated particle appears as a 
straight line, on which the past and future are two opposite rays. 

In the familiar spaces (Euclidean, affine, projective, non-Euclid- 
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ean) all positions are alike and all directions are alike. In the four¬ 
dimensional space-time, all positions are alike (provided we ignore 
the “mountain ranges” that represent matter) but directions are of 
five essentially different kinds. For each event 0, there are the 
directions of possible future events of a particle at 0, the directions 
of possible past events of such a particle, and the directions of 
events P which are neither before nor after 0. These three sets of 
directions are separated by the two nappes of the light cone (or “null 
cone,” or “isotropic cone”) whose generators are the world lines of 
light signals (or photons) that could be emitted from 0 or received 
at 0 [57, p. 41], This cone is, of course, a three-dimensional pseudo¬ 
manifold which joins its vertex 0 by straight lines to all the points 
on a nonruled quadric surface like the familiar sphere. 

The word cone is appropriate not merely because it gives a rough 
idea of the light rays associated with any event but because the most 
natural geometric spaces to use for our four-dimensional diagram 
are real affine four-space and real projective four-space , in both of 
which a quadric cone can be defined and a distinction can be made 
between three sets of lines through the vertex: generators that lie 
entirely on the cone, exterior lines on pairs of tangent hyperplanes, 
and interior lines that lie on no tangent hyperplanes. A generator 
is called a null line (or an isotropic line) because the measurement of 
any interval along it is zero. Other lines are said to be timelike or 
spacelike according as they are interior or exterior. The world line 
of a photon is a null line. The world lines of any other unac¬ 
celerated particle is timelike. The join OP of two events that are 
essentially incapable of influencing each other is spacelike [57, p. 
i 19; 58, p. 108]. 

Let A, B, C be three unaccelerated particles, initially together 
and remaining collinear but parting with relative velocities vbc > 
v CA = — vac, and vab * In nonrelativistic kinematics we would 
have 

vac = v ab + vbc- 

The special theory of relativity changes this to 

| vac = (vab + Vbc)/( 1 + VabVbc/c 2 ), 

I 

| where c is the velocity of light. The addition rule 

tanh (x + y) = (tanh a; + tanh y)/( 1 + tanh x tanh y) 

i 

i 

i 


I 

I 
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suggested to Robb the idea of using, instead of the velocity v, 
the rapidity r, defined by v = c tanh r, so that 

tab + r BC = r A c 

[28, p. 22; 57, p. 126]. 

Robb showed in his Appendix [48, p. 405] that the timelike direc¬ 
tions at any event 0 can be represented by the points of a hyper¬ 
bolic three-space in such a way that rapidities appear as distances. 
Thus the directions of the world lines of our three particles are 
represented by three collinear points A } B, C (Figure 6) and the 
equation tab + r B c = r A c becomes 

AB + BC = AC . 

It follows that the null vectors through 0 are represented by the 
points at infinity” of the hyperbolic space, and the light cone at 0 
is represented by the locus of points at infinity, which is the absolute 
quadric: a nonruled quadric in real projective three-space. From 
this it is a natural step to complete the picture by using the points 
outside the quadric to represent the spacelike directions from 0. 

Using the projective formula for a hyperbolic distance, we find 
that, if a and 0 are the polar planes of A and B , as in Figure 6, the 
relative rapidity r AB — AB of these two particles is given by 

tanh AB = \/{Aa, Bp}. 

Therefore their relative velocity is simply 

Vab = c tanh AB = cy/{Aoc, B0 ]: 
a pretty connection between kinematics and projective geometry! 
Although Robb [48, p. 22] was aware of the fact that this way of 
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representing the directions of world lines is valid in the general 
theory of relativity, he was chiefly concerned with the special 
theory, according to which the underlying four-space is affine and 
the three kinds of directions from any event are parallel to those 
from any other. He gave a sufficient set of axioms to describe a 
Minkowskian space-time which, from our standpoint, is simply a 
real affine four-space with a pseudo-Euclidean metric. In its pro¬ 
jective aspect, this is a projective four-space in which one hyper¬ 
plane has been specialized and called the three-space at infinity. 
Two lines are parallel if and only if their points at infinity coincide. 
The metric is imposed by specializing a nonruled quadric surface 
in this projective three-space. Two lines are orthogonal if and 
only if their points at infinity are conjugate with respect to ft. At 
any event 0, the light cone joins 0 to ft. The timelike and space¬ 
like lines join 0 to the interior and exterior points. In fact, this 
quadric ft, which is a common section of all light cones, serves as 
the absolute for Robb’s hyperbolic space whose ordinary points 
represent the timelike vectors. 

Using homogeneous coordinates xi, . . . , z$ in the projective 
four-space, we are free to choose the simplex of reference and “unit 
point” so that the special hyperplane is x 5 = 0 and the quadric 
surface ft is 

xx 2 + x 2 2 + xz 2 - x 4 2 = 0, x & = 0. 

Since the x’s are homogeneous, any point of the affine space x& 0 
may be expressed in the form (x\ 9 x 2 , xz y x 4 , 1 ); then the first four 
z’s serve as nonhomogeneous affine coordinates and can be identified 
with the familiar x , y y z y ct of the special theory of relativity. The 
light cone at any event (a) is 

(xi — ax) 2 + (x 2 — a 2 ) 2 + (xz — az) 2 — (x 4 — «4> 2 = 0. 

The line joining two events, A = (a) and B = ( 6 ), is timelike, null, 
or spacelike according as the number 

(ax — &j) 2 4* (&2 — b 2 ) 2 + (<X3 &3) 2 (°4 — ^ 4 ) 

is negative, zero, or positive; and the interval or separation AB 
between these two events is defined to be the square root of the 
absolute magnitude of this number. 

For comparison with Synge [57, p. 40], note that instead of 
(x ly x 2} xz } x 4 ) we should strictly write (x 1 , x 2 , x s , x 4 ). In fact, 
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Synge uses (xi, X2 7 xz 7 x±) for the so-called Minkowskian coordi¬ 
nates, which do not concern us at all [57, p. 56]. 

As a diagram for relativistic cosmology in the large, Minkowski’s 
world has been almost entirely superseded by de Sitter’s, which, as 
Synge says, “opens up new vistas, introducing us to the idea that 
space (a slice of space-time) may be finite, and this seems to satisfy 
some mental need in us, for infinity is one of those things which we 
find difficulty in comprehending” [58, p. 257]. Timelike lines 
remain infinite, but most of us are willing to accept eternity. 

In the same year that Eddington published the second edition of 
The Mathematical Theory of Relativity [28], Du Val [27] made a 
discovery that has been almost completely ignored, even by Robb 
and Synge. This discovery is that de Sitter’s world, in its polar or 
elliptic form [58, p. 260] is exterior-hyperbolic four-space: the part 
of real projective four-space that lies outside a nonruled quadric 
hypersurface. 

At any event A } the light cone envelops the absolute quadric; 
in other words, the null lines are tangents, the timelike lines are 
secants, and the spacelike lines are nonsecants (Figure 7). Study’s 
observation of the “nontriangle inequality” for a triangle of time¬ 
like lines is thus revealed as an early version of Einstein’s twin 
paradox: An astronaut flies away, at a speed c — e, to some distant 
planet, and returns, still in the prime of life, to find that his twin 
brother, who stayed at home, is now an old man. 

The polar hyperplane a of A intersects the quadric hypersurface 
in a quadric surface, and allows us to represent the lines of all three 




Figure 7 


Figure 8 


Figure 9 
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kinds through A by the points that are their a-sections, just like 
the hyperplane at infinity in Minkowski’s world. It is only when 
we pass from A to another “observer” B that differences are seen: 
the null lines through B are no longer “parallel” to those through A, 
But the adjective parallel can still be used for two tunelike lines 
that have the same point at infinity, that is, two lines that meet on 
the quadric hypersurface (as in Figure 8), so that their extensions 
inside the quadric are parallel in the sense of classical hyperbolic 
geometry. If such lines are the world lines of two unaccelerated 
particles A and B } the spacelike interval AB does not remain con¬ 
stant but decreases asymptotically. On the other hand, two secants 
which have in common an interior point P (as in Figure 9) are the 
world fines of unaccelerated particles which have a minimax interval 
on the polar hyperplane of P and afterwards diverge. 

This use of the word minimax has the following justification. 
For each position of A on the first world fine there is a corresponding 
position of B on the second, namely the foot of the perpendicular 
from A in the sense of the exterior-hyperbolic geometry. This is 
the point on the second world fine that is farthest from A. (Two 
other positions of B, lying on tangents from A to the quadric, are 
actually at zero distance from A.) This maximum distance AB 
attains its minimum when A is conjugate to P, so that AB is per¬ 
pendicular to both the world fines. 

Similar remarks apply to skew world fines [19] which do not fie in a 
plane (but, of course, in a hyperplane). Thus two random par¬ 
ticles (with an infinitesimal probability of having parallel world 
lines) will eventually diverge, and “a number of particles initially 
at rest will tend to scatter” [28, p. 161]. In fact, we can deduce 
the “expanding universe” from Du Val’s projective diagram without 
invoking any calculation. 

When we do wish to make calculations, we naturally take the 
absolute quadric to be 

x 2 + x 2 2 + x 3 2 — x 2 + x 5 2 = 0. 

Since the re’s are homogeneous, we can normalize the coordinates so 
that every event (#) satisfies 


X 2 + x 2 2 + x z 2 - x A 2 + xr? = 1. 
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Figure 10 


Then the light cone at any event (a) is 

£i 2 + x 2 2 + xz 2 — x 4 + x 5 2 — (aixi + a 2 x 2 + a b xz 

a 4 x 4 "I" 05*^5) = 0 , 

for example, the light cone at (0, 0, 0, 0, 1) is 
X 1 2 + x 2 2 + X3 2 - x 4 2 = 0, 

exactly as in the Minkowskian case. The line joining two events, 
A = (a) and i? = (6), is timelike, null, or spacelike according as 
their polar hyperplanes a and ft (in the ordinary or “interior” 
hyperbolic space) are ultraparallel (having the line AB as their 
common perpendicular), parallel, or intersecting (see Figure 10), 
that is [16, p. 193], according as the cross ratio 

{AB } fta} = (aibi + a 2 b 2 + a 3 & 3 — a 4 b 4 + G5&5) 2 = ( ab ) 2 

is greater than 1, equal to 1, or less than 1; and the separation AB 
between these two events is given by 

cosh AB = |(a&)|, AB = 0, or cos AB = \(ab)\, 

respectively [17, p. 281], It follows [17, p. 249] that the metric 
form <i> [58, p. 1] is 

dx i 2 + dx 2 -f dx z 2 - dx 4 2 + dx b 2 , 

where dx 4 or dx b can be eliminated (if we so desire) by means of the 
relation 


xi dx i + x 2 dx 2 + xz dxz — x 4 dx 4 + x b dx b = 0. 

Synge [58, p. 262] described de Sitter’s world as a space-time of 
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constant positive curvature. We have simplified the discussion by 
choosing such units of measurement that this curvature is 1 and the 
velocity of light is 1. In other words, our unit of distance is “the 
radius of curvature of the universe” and our unit of time is the time 
light would take to travel this distance. It must be admitted that 
this projective approach yields only the nonorientable “polar” 
form of de Sitter’s world, whereas Synge prefers its orientable cover¬ 
ing manifold, the “antipodal” form. However, all the above 
formulae remain valid; the only difference is that two antipodal 
events, such as (0, 0, 0, 0, ±1), are regarded as being distinct instead 
of coincident. 

CONCLUSION 

We have considered projective geometries of two or more dimen¬ 
sions over various fields, and the possibility of replacing the field 
by a corpus or a ternary ring. We might just as well have con¬ 
sidered affine, Euclidean, non-Euclidean, and pseudo-Euclidean 
geometries over the same fields. Thus there are many geometries, 
each describing another world: wonderlands and Utopias, refresh¬ 
ingly different from the world we live in. 

I hope that these excursions, in a few of many possible directions, 
have helped to reveal the healthy state of development of this 
fascinating subject, including its interactions with other branches of 
pure and applied mathematics. 
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3 

Mathematical Logic 

Georg Kreisel 


The bulk of the material in this chapter consists of technical mathe¬ 
matical results, some of which have pure mathematical interest. 
But they are primarily selected for their use in logical or founda¬ 
tional studies, for which the subject of mathematical logic was 
originally introduced. 

The data of foundations consist of the mathematical experience 
of the working mathematician. This presents itself as being about 
certain mathematical objects such as natural numbers or groups, 
the processes involved in mathematical activity present themselves 
as primarily mental and abstract (not sensory), but they are recorded 
and communicated by use of concrete , that is, spatiotemporal, 
symbols. The three elements of mathematical experience just 
mentioned are familiar from the so-called realist , idealist , and 
formalist views in the philosophy of mathematics, though, tradi¬ 
tionally, the latter are presented as conflicting views on the meaning 
of mathematical assertions: cf. below. 

The general problem of foundations is to analyze this experience 
taken as a whole or, at least, large portions of it. The following 
questions are regarded as basic for such an analysis: which, if any, 
95 
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of the three elements of mathematical experience above are primary 
and which are subsidiary? Depending on the answer: what gram¬ 
matical rules lead to meaningful expressions? and: what principles 
of proof or, more generally, of evidence are valid? 1 These problems 
and several distinctions that are basic in an analysis of the whole 
of mathematics have so far not been found to be equally important 
in the analysis or “foundation” of special branches such as familiar 
number theory or topology. As the subject progresses new impor¬ 
tant problems can be formulated. 

Two kinds of reactions to these questions are familiar. One 
regards it as hopeless to find an adequate and precise formulation of 
foundational problems; perhaps because some questions which 
seemed quite precise turned out to be ambiguous in some respects. 
The other is satisfied with quite superficial answers such as old- 
fashioned formalism (cf. Section 3 below); perhaps because the 
questions, though old and therefore natural, have progressed little 
for a long time. 

The main theme of the present exposition is to make out a case 
that real, if modest, progress has been made on foundational prob¬ 
lems. It is not a mere one-shot success story: at least some of the 
informal problems and conceptions did not collapse when unex¬ 
pected ambiguities and other difficulties were discovered. They 
became stable once a few basic distinctions were made. Moreover, 
generally the revisions needed were essentially unambiguous. Cer¬ 
tainly, no less cogent than the steps which have led, for example, 
from the atomic conception of matter at the time of the Greeks to 
its modem formulation. The particular conception which we use 
most to illustrate this is the well-known 

Formalist or Mechanist Conception of Mathematics. This conception 
contends that, though reasoning does not present itself to us as 
a manipulation of symbols—for example, words—according to 
mechanical rules, but rather as a series of insights about some 
imagined concrete or abstract structures, all reasoning is fully 

1 And, in the case of mathematics, these questions and their answers do not 
exhaust the questions about reasoning that present themselves to us naturally, 
that is, on the phenomenalistic level, particularly those concerning intel¬ 
ligibility. Here one does not primarily ask what mathematics is about, but 
tries to make the structure of complicated sets of mathematical facts graspable, 
as, for example, in the algebraic approach to (parts of) analysis. In a logical 
analysis, concerned with the validity of reasoning, one must ask what is asserted. 
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reflected in the manipulation of symbols and their combinatorial 
properties. This contention, of course, is consistent with very 
general positivistic views; but in the domain of mathematics, it is 
supported by a remarkable discovery , not at all obvious from a 
superficial examination of ordinary mathematical practice: for¬ 
malization, that is, the existence of simple mechanical rules (of 
predicate logic and set theory) from which the formalized versions of 
ordinary theorems can be derived. Of course, the discovery itself 
was not formal or linguistic, since the theorems of ordinary language 
and their formal representations do not resemble each other in 
external (syntactic) form, but in their meaning. 2 Formalism does 
not deny this, but only claims that the significant questions about 
reasoning can be treated as combinatorial questions about the mechanical 
rules* and solved by combinatorially evident methods; and thus all 
abstract existential assumptions are eliminated. For details, cf. 
Section 3. 

Irresoluble Conflicts. The first kind of defeatist mentioned above 
foresees hopeless ambiguity in what would be regarded as sig¬ 
nificant, and perhaps even in the meanings of “combinatorial” 
or “mechanical”; the other rejects all unanswered questions as 
insignificant! Therefore, for both of them the problem of the 
correctness of the formalist contention is not real. But although 
the theoretical possibility of conflict exists, experience of actual 
research shows that in the present case conflicts have been satis¬ 
factorily solved. 

On the one hand, Hilbert's program, described in Section 3, shows 
how to formulate (for a suitably sophisticated formalist conception) 
questions of evidence , which, at first sight, seems quite difficult. 
And, further, for an astonishingly wide area of mathematics (3.3232) 
they are solved positively, that is, Hilbert's program can be carried 
out. On the other, for negative results, it is necessary, of course, to 
give a searching analysis of the meanings of “combinatorial” and 

2 Indeed, the mechanical rules were not found inductively, but by formulating 
properties of the abstract concepts, like set or construction. This is one reason 
why we put the abstract theory of these notions first (Sections 1 and 2), and 
only afterwards (Section 3) the formalistic (proof theoretic) study of the axioms 
so found. 

8 Somewhat like the aim of elementary mechanics to represent planets as point 
masses, ignoring their physical composition and even their shape; it is known 
that the finer details of even their paths cannot be got in this way. 



98 Lectures on Modem Mathematics 


“mechanical”: for the analyses of 3.4 and 3.5 (involving in a modest 
way the questions of footnote 1), it is shown, in conjunction with 
GodeFs incompleteness theorem, that Hilbert's program cannot be 
carried out for certain specific parts of current mathematics. The 
simple-minded mechanistic position is refuted by the incomplete¬ 
ness theorem itself (3.2341). 

It should be stressed here that mathematical logic is essential in 
foundational problems not only in the advanced development, but in 
the precise formulation of logical theories; just as the discipline of 
partial differential equations is needed to formulate modern physical 
theories. 

Other Traditional Views . Particularly in view of the limitations of 
the mechanist conception, Sections 1 and 2 do not only serve the 
purposes mentioned in footnote 2, but are a contribution to other 
traditional conceptions of mathematics, namely so-called realism 
(or platonism, according to which also ideas are real, that is, external 
to ourselves) and idealism (or mentalism, according to which mathe¬ 
matical objects are thoughts). Here the situation is much less 
satisfactory than in the case of formalism; in particular, the known 
precise developments, namely those of Section 1 and 2, are remarka¬ 
bly special versions of realism and idealism respectively. Also 
realism and idealism are truly in opposition only if taken in the 
strict sense, as all-embracing conceptions, that is, that mathematics 
is only about objects external, not external to us. But they do not 
seem to be sufficiently developed to decide between them (or against 
both) nearly as well as in the case of formalism. We postpone the 
discussion to the end (Section 5) where we can refer to the technical 
machinery and specific results of the text, and where general open 
questions are formulated. 


SET THEORY 

The reader will be familiar with the standard techniques of set 
theory. He will find a beautifully readable discussion of the basic 
notions in [35], and detailed treatments in [1; 9]; in particular, a full 
development of the particular laws about sets known as Zermelo’s, 
respectively Zermelo-Frankels set theory. 

In any deductive exposition, one’s view of the nature of the sub¬ 
ject enters at the beginning; the rest is development. The realist 
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view of mathematics, in particular of its objectivity, determines the 
choice of (syntactic) rules for forming meaningful expressions and 
for making valid inferences. Implicit in the requirement of objec¬ 
tivity is this: symbols are to denote well-defined objects a, b . . . 
and relations R , and so R(a , b) is either true or false ; also, if c is a 
well-defined collection and (V6 E c) stands for: for all elements 
b of c } then (Vb E c)R(a , b) is a well-defined property (of a). Simi¬ 
larly for the other logical operations: negation “ conjunction *, 
implication =>, existential quantification H [note: (Vz E c)A(x) <=* 

(ax E c)A(x)]. Part of the “operational significance’" of the 
requirement of objectivity is that the laws of two-valued (or clas¬ 
sical) logic hold. 

The so-called set theoretic foundation of mathematics specializes 
this general requirement in two opposite directions. It asserts the 
existence of (well-defined) infinite collections, and it restricts itself 
to a single basic relation (membership: £). This restriction not 
only does not follow from the realist view, but is a priori not even 
plausible. It is supported by researches of the last century (reduc¬ 
tion of natural numbers and real numbers to set theory) which the 
reader is assumed to know. It is to be remarked that the reduction 
was carried out for these two concrete notions, and not for abstract 
structures (cf. 1.8 below). 

As stated above, the use of classical logic is implicit in the realist 
view, and so are the laws about sets which are used in current 
mathematical practice . The converse cannot be expected to hold, 
that is, that the view is implicit in the particular laws used in 
practice. (To consider alternative accounts of this practice is in 
fact the object of Hilbert’s program, cf. Section 3.) To what extent 
the laws used express not only objectivity, but that mathematical 
objects, that is, the particular objects considered here, are external 
to us, is discussed in Section 5. At the present stage we confine the 
analysis of the basic notions to the following remarks on 

Collections , Properties , or Abstract Objects ( Structures)t Our inten¬ 
tion here is to formulate laws that are valid when the variables 
denote collections of objects which are themselves collections, and 
“E” denotes membership. We do not require that the laws also 
hold if the variables are interpreted to range over properties and 
a E b is read as: a has the property 6, or: the variables range over 
abstract structures and oGl is read as: the structure a is an 
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instance of the structure b (for example, a = the structure of the 
integers with addition and multiplication, b = general ring struc¬ 
ture). Even if these two interpretations were sufficiently clear to 
make such a common theory feasible, we could not expect to get 
as many valid laws as for just one interpretation. The only ques¬ 
tion is: is the notion of collection not dependent on these other 
notions? For example, as is sometimes supposed, is it a mere 
analogue to finite collection, or an abstraction from conditions or 
properties? One speaks of a wood (a collection) without or before 
having counted the trees; so, finiteness is not essential. In fact, 
generally, the obvious “property” satisfied by just the elements of 
a collection C is simply: belonging to C. Next, for a collection to 
be regarded as an abstraction from (or the extension of) a property 
P, P either has to be defined for arbitrary objects or else limited in 
advance to some collection C'. 

In the latter case, C has to be presupposed, and this is indeed 
the usual case in mathematics, where one considers sets of some¬ 
thing, for example, of numbers or of sets of numbers. Here the 
primary step is to formulate basic principles of generating collections 
(below: collecting into one collection $(x) the subcollections of 
any given x); properties on particular C' can then be explicitly 
defined in terms of the collections so generated. This is the prin¬ 
ciple of the cumulative type theory below, except that, for a smooth 
development, one takes the process: x—* x^jfylx) instead. Zer- 
melo’s axioms are valid if one starts with an infinite collection of 
individuals, and iterates this construction co times. As empha¬ 
sized in [34] the restriction on the comprehension axiom is not 
ad hoc; in this structure, in general there simply is no collection x 
consisting of precisely those collections which satisfy a given con¬ 
dition $; but by the subset construction there is one satisfying 
$ • x E a if a is in the structure. (Of course, (H?/)Vx[a: 

$(x)] => (ao)(Yx)[$(x) =>iG a].) 

In the former case there is a genuine problem: what properties 
define collections, particularly if properties themselves are to be 
regarded as objects. This may be shown by means of the para¬ 
doxes. If x is the collection of objects satisfying a property, take 
{x} (whose only element is x) to be the property regarded as an 
object. Let r (for Russell) be any collection satisfying (Yx) (x £ r 
^ x), and so r r. Thus, Yx(x e r w [r} => x x), 
r u{r} Dr but r\j{r) ^ r. In other words, r is not the collection 
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of all objects satisfying x $ x. 4 This result has been (properly) 
used to refute a too simple-minded (platonist) realist view. 

1.1. Cumulative Theory of Types. Zermelo's informal derivation 
of his axioms can be analyzed by formulating explicitly properties of 
the cumulative-type structure described above. The analysis is in 
[54], not only for these axioms, but for several other important 
cases. The basic notions in terms of which the construction is 
described, are: 

Set (collection); a, x, y 9 . . . (are the corresponding 

variables) 

Iterations (types); a, ft ^7, • • • 

Equality: x — y, a = 0; membership: x £ y; (strict) order: 
£ < rj ; x is of type £, denoted by (x : £). Note: "type” means cumu¬ 
lative type, that is, x is obtained by < £, not only exactly £ iterations. 

Basic laws (NB. They are properties of the structure, not 
intended to “define” it axiomatically!) Unlimited quantifiers V£, 
3£ are intended to range over all ordinals < some given one 
\x, *&y over sets of the type structure of type <ao- Otherwise— 
for a set theoretic foundation in the strict sense, cf. page 98—the 
meaning of quantified expressions would be well defined only on 
the assumption that there is a collection consisting of all collections, 
which is false for the intended structure. It is not assumed that 
sets and ordinals are necessarily distinct kinds of objects. For 
individuals a, we use the convention Yx(x £ a <=> x — a) to ensure 
extensionality. For all a, b , a , 0 (considered) and formulae 
Yx(x £a^z£6)=>a = 5 (extensionality) 

32 /Yx{x £ y <=> [(x:a) • $(*)]} for all $ not containing y 
(comprehension), th ough a rbitrary properties $ are intended; 
a<P<^ (a = P'P < a) (tricho tomy), (H£)(a < £), 

3£$(£) => 3£{<l>(£) * Kyiv < i * $(*)]} for all <£ (least element) 
Yx{(x:a) *=> 3£(£ < a • Y y[y £ x =» (y :£)])} (collection) 
a:£(a : £), RxYy[(y :a) «=» y £ x] (type structure) 

Infinite types are ensured by: 

3w3£Vi73f {£ < « • h < « =» < f • f < »)]}. 

4 One of the set theoretic definitions of ordinals takes the empty set 0 to be zero: 
a —■► avj{ot) as the successor function (and unions for limits). The argument 
above is literally the proof that there is no greatest natural number (greatest 
ordinal). 
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A more recondite, but quite basic, principle asserts the existence 
of a supremum . 

(V* < v) => (3f)(V£ < 17 ) for all S. 

This “ordinal^ theoretic” principle ! is equivalent (by use of 
the remaining axioms) to the reflection principle V£3i?(£ < 77 * 
(Vx:ri)(Yy:ri) . . . {$(x, y, . . .) <=» 3> (l,) (:c, y, . . .)}) where $ (,) is 
obtained from <£jbyJrestricting all variables in $ to be of type 17 : 
what is true in the whole universe under consideration is already 
true at some 77(17 < a 0 ). 

The proof of the reflection principle from the supremum contains 
what might be described as the method of the subject! It is fully 
illustrated by considering purely existential $, say T Kz^/(x 1 z), \p quan¬ 
tifier free: for each x of type £, either 3 ^(a;, z) holds in the structure 
or there is a least type 17 * for which: a>[(£:i?*) * t(z, *)]. Take the 
supremum £ 1 , and iterate. Take for 17 the first fixed point of 

1 . 2 . First Main Result. The logical consequences of (the proper¬ 
ties of the cumulative type structure in) 1 . 1 . which are purely 
set theoretic, that is, those expressed in terms of £ and lower-case 
letters, are precisely those of Zermelo-Frankel’s set theory [ 53 ]. 

Similarly one obtains an axiomatization of all ordinal theoretic 
consequences, that is, those expressible in terms of < and lower-case 
Greekjetters. 

1.3. Converse Results (reductions to set theory). Naturally one 
cannot expect to prove that the ordinals are sets or vice versa! 
But the familiar definitions of ordinals in set theory show: 

1.31. There are purely set theoretic conditions 0(x), L(x, y ), 
T(x, y) [x is an ordinal, x precedes y , x is of type y and 0(y), 
respectively] such that the laws of 1.1 are formal theorems of set 
theory when Greek variables are restricted to range over 0, a < is 
replaced by L(a y /3), and (, x\a ) by T(x, a). 

1.32. (Strengthening). Dedekind showed that if a, 6(6 C a 2 ), 
a', 6' are two pairs of sets which satisfy Peano’s axioms (a is the 
domain of the order relation 6), then a, a' are isomorphic so as to 
map: 6 —> 6'. In other words, Peano’s axioms determine the struc¬ 
ture uniquely. Without the axiom of infinity, one can restate the 
the result as follows: If N(x), L(x f y) and N'{x), L'(x, y) are two 
pairs of conditions for which Peano’s axioms can be proved y then an 
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isomorphism can be defined which maps N onto N' y preserving order. 
Presumably the same holds for 0, L above. 

The other converse result to be considered is this: to give purely 
ordinal theoretic conditions ikf(a), E(a y fi), Ti(a y p)(a is a set, 
a £ ft y the set a is of type respectively) such that the axioms of 1.1 
are satisfied under this interpretation. This is a more delicate 
matter, cf. 1.52; also, constants for certain ordinal functions have 
to be introduced since by use of < only ordinals less than o>" can be 
defined. 

1 .33. There are several quite interesting refinements of these 
reductions in the literature relating natural fragments of set theory 
to natural schemes of axioms for other familiar mathematical 
notions; for example, first order arithmetic (Z of [6]) and set theory 
without the axiom of infinity; or classical analysis (Suppl. IV of 
[6]) and set theory with the axiom of infinity, but without the power 
set axiom. 

1.4. Changing the Parameters in 1 . 1 . There are three things that 
are naturally changed: the starting collection Co of individuals (pos¬ 
sibly empty), the number ao of iterations, the kinds of subcollection 
taken at each stage. 

1.41. Evidently the axioms 1.1 hold for an arbitrary C 0 , all 
ao satisfying suitable closure conditions, for example, ao any limit 
number > c*> for Zermelo’s own theory, and taking all subcollections. 
This was simply the intention. 

1.42. (Almost evidently) since we have only denumerably many 
axioms in 1.1, by a simple application of the Skolem-Loewenheim 
argument, even the supremum axiom is satisfied for a suitable ao by 

taking suitable denumerable subcollections at each a < a 0 (a 0 = No). 

The problem one sets oneself is this: to find manageable restrictions 
on the collecting process for which the axioms hold; or, more speci¬ 
fically, restrictions which yield minimal models. Compare the case 
of Euclidean space. The intended structure consists of all points (or, 
in terms of coordinates, of all triples of real numbers). However, 
for the study of particular laws, that is, of a particular axiomatic 
theory, it is necessary to introduce manageably restricted models, 
for example, Pythagorean fields, for showing the impossibility of 
ruler-compass solutions. 

Useful specializations of parameters in 1.1, which will be used 
below, are these. 


104 Lectures on Modem Mathematics 


1*431. Co = 0: all sets are well founded, that is, there are no 
infinite descending £ chains. 6 

1.432. Co = 0, «o = w. These are the so-called hereditarily 
finite sets. They satisfy all the axioms except of course the axiom 
of infinity or the reflection principle; the replacement axiom 
expresses that a function with a finite range has a finite domain. 

1.433. C 0 = 0, = G (first uncountable ordinal), take all 

countable subcollections at each stage. Now the power set axiom is 
not satisfied, but the replacement axiom is, the countable axiom of 
choice being satisfied for the intended structure (cf. [35]). 

1.5. Second Main Result . Let the ramified hierarchy on Co be 
defined as follows, for ordinals a : (each R ay a > 0, is a collection of 
collections) 

Rq “ Co, 

For limit numbers a, R a = U Rp 

0<a 

R<*+1 ; the union of R a and of all subsets of R a explicitly definable 
by means of symbols for particular elements of R a} a£&, proposi¬ 
tional connectives and quantifiers over R a . For the connection with 
Russell's theory of orders (Poincare's predicativity) cf. [33]. 

All familiar axiom schemes for set theory are satisfied by suitable 
R a . More precisely, for Co = 0, 

1.511. Zermelo's axioms are satisfied at 

1.512. Zermelo-Frankel's axioms are satisfied at the first inac¬ 
cessible 6 ordinal >co, that is, the first £ at which the full cumulative 
type structure satisfies them (in their intended form, cf. page 101). 

1.513. The results extend to additions of axioms expressing the 
existence of larger inaccessible ordinals. 

The proof [5] consists of formulating (set theoretically) the 
definition of the sequence R a by transfinite induction, that is, giving 


6 In this case the axiom of regularity (Fundierung) is valid. 

6 In traditional terminology; the least £ such that for all v, f (g < £, f < |) 
and all sequences a p (a p < £) ^ < £, rf < £. <a is inaccessible, so is 2. 

p<v 

The terminology is intended to suggest that £ is critical like co: The results of 
1.7 cast doubt on the reasonableness of this terminology. Why should closure 
of £ for these two piddling operations of sums and powers constitute inacces¬ 
sibility? Does it really express a significant property of co? Note that w is 
also the first ordinal with the property: Va(S < a> => a + 1 < a). 
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a condition L(x):&a(x G £«)• The proof of 1.511 is slightly dif¬ 
ferent from 1.512 since one cannot use the ordinals of footnote 4, 
only such ordinals <w + to being assured by Zermelo’s axioms. 
One uses well orderings instead. Then the proof that the sets 
satisfying L(x) satisfy the set theoretic axioms is straightforward, 
by essential use of the argument at the end of 1.1 (and, again, 
slightly more complicated in 1.511). An explicit well ordering of 
the sets in L{x) is defined by letting x precede y if the first definition 
of x precedes the first definition of y in the ramified hierarchy. The 
generalized continuum hypothesis (2 K “ = X«+i) holds when relativized 
to L, and so do several famous hypotheses from the theory of sets of 
points. These are the celebrated results of [5]. 

1.52. Corollary . The ordinal theoretic analogue to 1.31 holds, 
that is, the cumulative theory of types can be interpreted in a theory 
of ordinals, cf. [70]. The required relation E(a , 0) for the interpre¬ 
tation of G is taken to be: the set defined by the ath definition in 
the ramified hierarchy is an element of the set defined by the £th 
definition, and this occurrence of elementhood is expressible by 
purely ordinal theoretic notions of [71]. 

In the early literature on set theory there was considerable (and 
perfectly sensible) interest in the question: is the notion of set 
definable by (iterated) explicit definitions, as in the ramified 
hierarchy, more elementary than the principles of set generation in 
the cumulative hierarchy? The result above clarifies the situation 
completely as far as familiar principles of set generation are con¬ 
cerned: If only the principles of definition, not the ordinals (itera¬ 
tions) are restricted, it makes no difference from the point of view of 
consistency. 7 The problem is shifted to questions about the exist¬ 
ence of ordinals with certain closure properties. It is not reasonable 
to use the ramified hierarchy to make the axioms of set theory 
itself more evident. But what it does with wonderful efficiency is 
to reduce extensions of set theory to set theory. 

1.53. Refinements . There are several significant hierarchies in 
logic [43]. There is little doubt that all of them are more elegantly 
presented as segments R a , in particular as hierarchies of sets of 

7 As noted in [5], the argument described above in the form: the ordinals of a 
model of set theory S allow one to define a ramified hierarchy satisfying S , can 
be replaced by a statement with weaker premise and weaker conclusion: the 
(formal) consistency of S implies quite finitistically the formal consistency of S 
together with the assertion: V xL(x). 
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higher type and not just of sets of natural numbers. More isolated 
applications are these: 

1.531. C 0 = {a 0y a h . . .}, a collection of individuals without 
any structure. One easily verifies that all sets of R a built on such a 
Co are invariant under enough permutations of Co to show that the 
axiom of choice does not hold. (Independence of the axiom of 
choice when regularity is not assumed.) 

1.532. From 1.511, 1.512. Considering the first ordinal ocz(azF 
respectively) at which R a ^ F) satisfies Zermelo(-Frankel)’s axioms, 
one verifies countability 8 of az(F) and further that there is a minimal 
model among all models of sets, that is, models in which £ is well 
founded. For a direct proof, cf. [21]. 

1.533. Just as in 1.532 Zermelo’s axioms give a simple char¬ 
acterisation of a z (in fact, the only one we have), so there are simple 
sets of axioms of set theory which characterize «i: the first non¬ 
recursive ordinal. Set theory without the power set axiom and 
with the replacement axiom predicatively restricted is first satisfied 
at R mi . A more significant property [65]: For every a < coi, there 
is an ordering of ordinal a in R^ i, and all well orderings definable 
in R a have ordinal <a>i. 

1.534. It is easily verified that all sets of natural numbers which 
belong to R a (<x < «zf), are definable in the forms: V>$3!FA(?i, S , T) 
and &SVTB(n, S y T), where S , T range over all sets of natural 
numbers, and A, B are purely arithmetic predicates. Conversely, 
by [63], if a set is so definable it occurs in the (full) hierarchy and 
furthermore, in this case, 

(VS RA )(KT RA )A(n y S , T) t^VSKTAin, S y T ) 

where S RA} T RA range only over sets of numbers in the hierarchy: 
the ranges considered are different, but the sets defined are the 
same. This leads trivially to the conclusion: 

1.535. If a statement aS^Y^a (7^3(5, T y U) (A arith¬ 
metic) holds, then also a*S f Y7 7 3C/A(S, T y U). In proof theoretic 
form: if it can be proved using the axiom of choice and/or the con¬ 
tinuum hypothesis, it can also be proved outright. (Poincares 

8 Countable in the full model. While for given Co and given abstract ordinal a, 
R a is completely determined, a given description of an ordinal will in general 
define different ordinals in different R a ; for example, the description: the first 
a where R a satisfies the axioms of Zermelo; no such ordinal exists in R a . y but 
is even countable (by means of mappings) in R Qxr . 
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conjecture is comfortably covered by this!) Thus these axioms are 
not needed in “concrete” mathematics, though occasionally a proof 
is simpler when they are used, for example in computing homotopy 
groups. 

1.536. Trivially, there are statements YSH-TYUA (S, T f U) which 
are true in the ramified hierarchy but not generally (for example, 
every set of natural numbers is definable in the ramified hierarchy). 
It seems to be open whether every set {n:V£3TY[/A(n, S , T, U)} 
occurs in the ramified hierarchy, of course possibly with a different 
definition. 

The ramified hierarchy uses essentially the smallest collecting 
operation; it has been suggested [35] that the maximal collecting 
operation, that is, the intended cumulative hierarchy, would yield 
complementary results, for example, 2 Ko > Xi (of course, not for 
the axiom of choice). The situation is asymmetrical because a proof 
of 2 X# > Xi in this structure not only establishes the formal inde¬ 
pendence of the continuum hypothesis, but refutes it. Actually, 
the formal independence, and much more, was recently established 
in [22] by a searching analysis of the ramified hierarchy which uses 

1.6. A New Method . All parameters entering into the cumulative 
type structure are different from the intended structure! The 
collecting operation is that of the ramified hierarchy, the iteration 
is stopped at a countable ao (where R aQ satisfies the axioms of the 
set theory considered, cf. 1.532), and the starting collection C 0 
consists of so-called generic sets. Evidently, for R ao (C o), that is, the 
ramified hierarchy starting with Co, to satisfy 2 X<) ^ Xi (or simply 
to be Ra 0 ), the sets in C 0 must not be definable in R a0 , though they 
may be in R& for some /3 > ao- 

The two leading considerations: 

1.611. The supremum principle (the basic method of the sub¬ 
ject) for ordinals <«o and relations defined in Ra 0 (C o) can be applied 
to verify the axioms of set theory in R aQ (C o) only if the part (neigh¬ 
borhood) of C 0 “involved” in any particular axiom of set theory 
can be defined in f2« 0 . 

1.612. The particular statements (continuum hypothesis, axiom 
of choice) which are to be false in R ao (C o) happen to be existential , 
and so one will try to make it hard to satisfy existential statements. 

To simplify the statement of the notions introduced in [22] for 
handling 1.611, 1.612, recall that just as in 1.52, statements of 
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Ra 0 (Co) can be put into ordinal theoretic form, that is, instead of 
using quantifiers over the sets defined from C 0 (a collection which 
depends on the unknown Co), one uses quantifiers over a < ao 
(a fixed range). 

1.62. Forcing . Consider a domain D of individuals and rela¬ 
tions on it. Let C be a variable for subsets of D, and let P denote 
conditions on or neighborhoods of C [of the form (d E C, d' & C , 

• • • , d" E C)]. Formulae A are built up from primitive formulae 
and C(x ) by means of , H. (V A is regarded as an abbreviation 
for: 3L4, A => B for: A • B.) 

P lb A (read: P forces A) is defined by induction: 

1.621. A free of C : P ||- A if A is true in D . 

1.622. A = C(d) : P jf- A if Vd E Cl is in P. 

1.623. A = A X - A 2 : P \\-A if P |f- A x and P ||- A* 

1.624. A — B : P |f- A if, for all consistent extensions P' of 

P, not P' ||- B. 

1.625. A = 3 xB(x ): P \\- A if, for some d E D, P ||- P(d). 

Note that 1.625 is a strong uniformity requirement since d depends 
only on P not on the details of C, cf. 1.612. 

Evidently, 1.626: P cannot force both A and A , 1.627: for every 
P there is an extension P', P' |j- A or P' \\-A f 1.628: by an easy 
induction, if P \\~ A and P' D P, P' ||- A. 

1.63. Generic Sets. Co is called generic, if for every A in the 
language considered there is a neighborhood P of C 0 such that 
P ||- A or P ||- A. 

1.631. Existence. Enumerate the formulae A h A 2 , . . . ; take 

Pi so that P x |Mi or P x ||- (1.627), then, P 2 D Pi, so that 

P 2 Ih A 2 or P 2 ||- and hence (1.628) P 2 ||- ^4 1 or P 2 |(- Ji, etc. 
Take Co = {d:3i(rd E Cl E Pi)}. 

1.632. For generic sets Co, Co makes A true , if some neighborhood 
of Co forces A; in fact, Co satisfies Ai if Pi |[- Ai. (By induction on 
logical complexity of formulae.) Because of this equivalence 
between being true and being forced (for generic sets!), by 1.625, 
we know that a true existential statement satisfies strong uni¬ 
formity conditions. 

Applying this to the case where D consists of the ordinals <a 0 
and the relations on D are those of 1.52, one satisfies requirement 
1.611 by choosing the sequence P t * in some reasonably regular 
fashion; for example, if C C w and P are to be./mzte sets, one enum¬ 
erates the pairs of disjoint finite subsets <s, t> of w, and takes P x 
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to be the first set of conditions {n E C:n £ s}^{n C:n t] 
which |1~ A i or ||- A\. It is easily verified that for any fixed A the 
relation P ||-A is definable in R a „. For each of the usual axioms, 
the method of supremum shows that R a t(Co) satisfies it, if R at satis¬ 
fies (a suitable finite number of) them. 

By delicate choice of C'o C (3 (/3 instead of«) and use of countabil¬ 
ity of ao (in some R y , y > ao), R«(Co) can be made to satisfy 
2 Xo = Xa except for a excluded by Konig’s theorem (that is, 
a = 2a n , a„ < a), the negation of the axiom of choice, and so 
forth [22]. 

The method is flexible, for example, instead of finite sets of 
conditions, suitable infinite ones (definable in R at ) can be profitably 
used [64], In fact, by more or less ingenious applications, the 
formal character of most of the famous open problems of abstract 
set theory and the theory of sets of points has been settled: the 
answers are different for different (suitably chosen) J2 a0 (C). At 
the time of writing it seems to be still open whether Souslin s 
hypothesis can be refuted by means of the usual axioms of set theory. 
For questions of formal nonderivability in set theory, the method is as 
decisive as the work of Abel and Galois for establishing the impos¬ 
sibility of ruler-compass (or other geometrical) constructions. 

Quite recently [75] a model of set theory has been defined in 
which the countable axiom of choice is satisfied but all sets of real 
numbers are /^-measurable. 

Technical and Heuristic Remarks (for specialists). A more formal 
motivation of forcing (1.62) starting with 1.611, 1.612 is this. For 
easy comparison with the relevant literature, take D to be w, and let 
C(n) denote the sequence Co, , c„_i where c,- = 0 if i G C, 
d = 1 if i G C. We think of C as incompletely defined objects. 
Therefore, as emphasized at the beginning of this section, the 
ordinary logical operations do not make sense for expressions involv¬ 
ing C, and so we use different logical symbols; for the moment, 
purely formally: ( ) for all; (E) for there is; D for implies. All 

we assume is that, when applied to expressions not containing C, 
they reduce to V, H, —Let C, C' be variables for incompletely 
defined “sets.” Then, to ensure 1.612 we put 

1.641. Ai(C) D (Ep)(C')[C{p) = C'(p ) D A,-(C')], 
that is, only finitely many values of C should be involved in each A%, 
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and 

1.642. 

(C)(En)A(C, n) D (. Em)(C)(En)(C')[C(m ) - <?'(m) D A(C', «)], 

that is, existential statements should require uniformity. (1.611) 
will be automatically satisfied as long as each <?(p), for fixed p, is 
definable. 

Now, 1.641, 1.642 are, formally, precisely the axioms 5.1 and 
5.2 for absolutely free choice sequences [47]. Each condition P 
is determined by a sequence of 0,1; P |[- A, A containing C , is: 
(C)[G(p) = P D A] where p is the length of P. As usual, A may 
be replaced by A 3 0 = 1. Then 

1.643. Theorem 4 (iii) of [47] establishes the forcing rules 1.621- 
1.625. 

In other words, the meaning of formulae (C)[<?(p) = P D A], for 
fixed P, is expressed in terms of expressions which do not contain 
the variable C. 

Not only are the usual logical operations meaningless if applied 
to incompletely defined objects, but 1.641 (and 1.642) are for¬ 
mally inconsistent if regarded as statements of classical logic: take 
(Ex){y) ~ [C(y)& ~C(x)] for Ai (where ~Pis short for B D 0 = 1). 
Conversely, it is easy to verify that if a formula A is provable by 
Heyting’s rules, P \\- A. 9 

1.65. As a counter part of 1.535 (same notation). If a statement 
YS3.TVUA(S } T } U) is provable, for example, from the negation of 
the continuum hypothesis, it is provable outright. But there is 
an asymmetry between the argument here and in 1.535: there the 
truth of RSVTKUAiS, T , U) was inferred from the truth in the 
ramified hierarchy, here one considers different P« 0 (Co) for dif¬ 
ferent S. 


* The notion of forcing is not a special case of the intuitionistic notions consid¬ 
ered in Section 2: it so happens that, for the languages considered, the usual 
formal laws of intuitionistic logic apply also to forcing. [If we consider fixed 
ranges of quantifiers, as above, an additional rule is valid, namely the «-rule: 

jf P|[-4(l), . . . then P ||-3xA(a;) without the corresponding 

intuitionistic requirement that the premise be constructively established.] It 
is known [11] that the intuitionistic formal laws hold for certain elementary 
parts of the theory of open sets in topological spaces. The fact that they hold 
for forcing is more useful (or that the forcing relation can be “topologized”) 
because nobody is interested in the elementary part of the theory of open sets 
per se. 
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1.66. Similarly to 1.65: the (finitist) relative consistency proof of 
[22] is (necessarily) more sophisticated than [5]; in [5] a model R a 
could be defined once and for all, and each axiom of set theory shown 
to hold when relativized to this R a . In [22] definitions of different 
R a (C n ) have to be given and then one shows that the first n axioms 
can be formally proved to be satisfied in # a (C n )* 

1.67. Consequences. The work of [22] differs from the usual 
formal independence proofs in logic in being a pure undecidability 
result, in contrast to a proof envisaged in [35], cf. end of 1.5. Less 
hypothetically, consider Zermelo’s set theory (even with the axiom 
of infinity); one can express the statement: there are sets not of 
type co + co. This is undecidable because the sets of type co + co 
satisfy the axioms with the sets of type co + co + co. But once 
observed, this omission is regarded as an oversight, and the same is 
true for axioms asserting the existence of inaccessible ordinals. It 
may be of axiomatic interest to show that certain results hold also 
for sets of low type, but once one has formulated the cumulative 
type structure one has no reason to stop at co + w. 

So, the problem is to find new axioms of set theory. The problem 
has always existed (particularly before current axioms were found!). 
[22] shows that one really has to do so even to decide the continuum 
problem, while previously there was only moral certainty; cf. [35] 
which is almost wholly devoted to explaining the nature and impor¬ 
tance of this problem. If one were not accustomed to the reduction 
of mathematics to the set theoretic vocabulary with the single 
primitive 6E, one would hardly expect to solve the continuum prob¬ 
lem by use of evident axioms in this language. The problem speaks 
of all subsets of why should not there be significant statements 
about such sets which simply cannot be formulated in this language, 
but do express something about $(co)? 

The simple-minded conclusion is to look for new primitive notions; 
for example, in the marginal mathematical literature one has asser¬ 
tions about the “set” of interesting or the “set” of feasible numbers! 10 
The immediate prospect of success is not good, at least not for one 
accustomed to using only the set theoretic vocabulary; he finds it 

10 The system [18] contains (besides £) the primitive predicate M(x):x is a 
collection. But the axioms do not express anything specific about noncollec¬ 
tions, and the intended interpretation of x £ y is left open when M{x) • M(y). 
Therefore one cannot expect the system to be strong. This is formally estab¬ 
lished in [50]. 
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hard to take new primitive notions seriously unless they are defined 
set theoretically; and this is self-defeating. This is the reason why, 
for example, the use of infinite formulae does not seem promising in 
this connection. It is to be remarked that the search for new 
primitives is of independent interest. Consider, for comparison, 
Euclidean geometry; if we confine ourselves to the notions of 
betweenness and congruence we actually have a natural set of 
axioms which decides every question in this language [15]. But as a 
theory of (the concept of) space it is inadequate because it lacks the 
differential structure: introducing length of curved paths is intro¬ 
ducing a new primitive. 

A much more imaginative suggestion than the use of new primi¬ 
tive notions is in [35], namely to decide questions about sets of low 
type (w + 2 for the continuum problem) by means of 

1.7. Axioms of Infinity (existence of sets of high type). Back¬ 
ground: some relations between higher types and arithmetic. 

1.711. In the fundamental paper [32] there are not only formally 
undecided statements of arithmetic, that is, with all quantifiers 
restricted to co, but it is shown how simple axioms about higher types 
give a formal decision; for example, the consistency of Zermelo’s 
set theory is expressed by a purely arithmetic statement, not deriva¬ 
ble in the theory, but is derivable from the assertion that there is a 
set of all sets of type co + a>. 

1.712. For R a countable in R more sets of natural numbers occur 
in Rfi than in R a (though not necessarily in i2 a+ i). 

1.713. (In the notation of 1.53). In R aZF there are more sets 
of natural numbers than in R az . 

This illustrates the relevance of higher types in principle for 
deciding questions of low type. In specific cases, for example, the 
continuum problem 2 Ko -Hi, there is a difficulty; for example, 
Razr n V(») strictly contains R aZ P $(«), but also R azr contains 
mappings of co onto longer initial segments of the ordinals than does 
Raz . Both 2 Ko and Xi are increased, but by 1.511-1.513, remain 
equal, even on the assumption of the existence of the usual kinds of 
inaccessible ordinals. There is a generalization of the reflection 
principle (1.1) from which these assumptions follow, cf. [19], and 
which does not seem to be known to be consistent with the con¬ 
tinuum hypothesis. 

1.72. Measurable Ordinals . In various early attempts (for a 
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survey, cf. [42]) to formulate important properties of co in set theo¬ 
retic form, of which inaccessibility was one, an initial ordinal k was 
called measurable provided 

1.721. There is a K-additive 2-valued measure on all subsets of 
k such that points have measure 0, and k has nonzero measure (for 
example, a> or 2). 

1.722. Xi-additivity (that is, countable additivity) implies that 
some ki(ki < k) is Ki-measurable, and so > w. 

Measurable ordinals are inaccessible. But the basic character 
and the possible significance of the former were completely obscure 
until 1960: 

1.731. Essentially any reasonable ordinal >co is not measurable; 
(for methods, cf. [36], precise statements [37]) and 

1.732. If R* satisfies the axioms of set theory, R K > does not 
contain a measure on #c, for any Xi-measurable k > w. 

The proof in [62] exploits beautifully the fact that the collecting 
operation of the ramified hierarchy is minimal, cf. 1.536. 

A different method [60] leads to a remarkable sharpening of 1.732: 

1.733. If there are measurable #c > w, ^P(co) r\ R K is countable; 
in fact, all the ordinals accessible in R Kf are countable. 

[60] contains various other results, all to the effect that Xi is 
large! So, even the assumption of the existence of measurable 
cardinals may be insufficient to decide the continuum hypothesis. 
[Added in proof: In fact it is [74]: the cardinality of the continuum 
is formally not affected by assuming the existence of measurable 
ordinals > w.] 

There is a genuine difficulty here. An axiom of infinity deserving 
this name must assert the existence of an ordinal which is as different 
from « as <a is from 2 (or any other finite number). One will have 
to consider carefully what kind of evidence to expect for an axiom 
of this kind, since both exaggerated demands and too much “toler¬ 
ance” make the subject trivial. A more formal problem is to give a 
syntactic characterization (that is, a class 3 of formulae) of the 
general form of axioms of infinity. It seems not impossible that set 
theoretic truth can be reduced to axioms of infinity, in this sense: 
for each set theoretic formula A there is an A', A* in 3?, such that 
A <=> A f holds in ordinary set theory (with the axiom of choice). 
This alone, however, would not help one to decide open questions 
since one would not know if A' is true. 

1.74. Remark on Model Theory and Recursion Theory . In 1.73 
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model theoretic methods in the sense of [12] are used in an essential 
way, and [36] uses infinitely long formulae. The laws of logic valid 
for such “formulae” are expressed (i) syntactically, as (set theoretic) 
relations between the sets <p used as such formulae, and (ii) semanti¬ 
cally, concerning the satisfaction relation: the structure a satisfies <p. 
Both syntax and semantics need a generalization of recursion theory, 
in the sense of a theory of model theoretic invariants (cf. [56] but 
also [32]); thus recursion theory appears as a branch of model theory. 
Though model theory is a product of set theory, it is not included in 
the present report. It is a mathematical discipline in its own right, 
much of it no longer primarily concerned with foundations (of “all”) 
of mathematics, but with a systematic understanding of particular 
branches: 

1.741. In algebra [12] it allows a clear separation between general 
and specific aspects of a problem: why is the existence of an order 
in a commutative field equivalent to a set of inequalities? and why 
do we need infinitely many? In analysis, nonstandard Hilbert 
spaces (infinitesimals [59]) explain the occurrence of a “point” 
spectrum inside continuous spectra in the theory of operators; not 
unlike the use of the complex plane explains the behavior of power 
series on the real axis [for example, 2( —1 ) n z 2 diverges at z = 1 
although (1 + # 2 ) 1 is smooth there]. 

1.742. Besides its use as a method , it contains spectacular specific 
results such as the far-reaching generalization of the theory of real 
closed and algebraically closed fields in [55], and the beautiful 
characterization of subgroups of finitely presented groups in terms of 
recursion theoretic concepts in [40]. Another striking application, 
to p-adic fields, is made in the recent [73]. 

1 .8. Semantics and Set Theory (self-application in the set theoretic 
foundations of logic). This is our first example where it is unnatural 
to restrict the objects considered to collections in the cumulative 
type structure. It is to be expected that the analysis of ordinary 
mathematical practice will not force one to introduce a more general 
notion of object, since (cf. page 100) much of this practice is devel¬ 
oped from areas where there is a natural type distinction. Logic 
is a more promising area since it is supposed to be about arbitrary 
objects. 

Note that the logical operations are part of set theory; their 
elimination in favor of set theoretic operations (boolean and pro- 
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jection) is a purely formal matter, like deciding that the chicken 
comes before the egg. The genuine problem concerns logical 
notions like validity, truth, consequence, and so on. 

1.81. Truth, validity, and so on, in set theoretic structures. These 
notions [16] are required in model theory (1.74). To fix the ideas 
consider formulae of predicate logic (PL) with a single (binary) 
relation symbol E. Since all definitions are to be given set theo¬ 
retically, formulae are certain finite sets (sequences of particular 
sets: the symbols). Objects of PL are distinguished from inter¬ 
preted formulae of set theory by use of bold-face letters: lower-case 
letters denote (formal) variables of PL, capitals denote formulae of 
PL, and the (formal) constants a, -t, A, V are to be interpreted as 
•, ", Y, 3 respectively (the last two restricted to some set). A sub¬ 
stitution instance of A is obtained by replacing the formal constants 
as intended, lower-case bold-face letters by ordinary ones, and 
E(x, y) by some well-formed set theoretic formula E(x , y ). 

1.811. For each fixed A, Sat (x, a, e, A) [read: x £ a and x satisfies 
A in the universe a with e(e C a 2 ) replacing E] is the substitution 
instance A a (e) of A when x is replaced by x, all quantifiers restricted 
to a, and E(x, y) by (x, y) £ e. If A is closed (no free variable), 
this is independent of x, also written: T (a, e, A) [read: A is true in 
the structure (a, e)]. Evidently, for example 

(Vxoe)[Sat (x, a, e, — rA) <=> Sat (x, o, e> A)], 
or, for closed VxA 

(Vae)[T(a, e, VxA) <=> ®x £ a) Sat (x, a, e, A)]. 

1.812. V(A) is defined to be (Vae)T(a, e, A), for closed A (read: 
A is set theoretically valid.) Note 

1.8121. V (A) =» (V ae)T{a, e, A) and 

1.8122. T(a, e, A) <=> A a (e) for each A. 

1.82. Truth, validity, and so on, in the universe. There is no 
set theoretical formula £(A) with variable A such that: if, for each 
fixed A, A* is the substitution instance of A with unrestricted 
quantifiers, and E(x, y) replaced by x £ y, then: 1(A) <=> A* (no 
truth definition). The reason is that on the one hand we speak of 
the infinite collection of PL formulae, on the other hand we use only 
finite formulae in set theory. What we have is this: for any fixed 
A, there is £ A (B) with variable B, such that, for fixed B, if B is not a 
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subformula of A: £ A (B) if B is a subformula of A: £ A (B) <=>£*, 
that is: 

1.821. There are restricted truth definitions for the universe. 

1.83. Axiomatic Theory of Logical Validity . The problem to 
be considered is the analogue of the reduction of arithmetic to set 
theory (1.31): one formulates laws for the basic notions, for exam¬ 
ple, logical validity: Val (A)—corresponding to Peano’s axioms in 
1.3—and looks for set theoretic conditions which satisfy these laws, 
or are even determined by them. 

Rules of inference . Suppose certain rules have been recognized 
to be logically valid. The property of being derivable by the rules 
is set theoretically (arithmetically) definable, say P(A) (for varia¬ 
ble A). 

1.831. VA[P(A) => Val (A)] is assumed to be recognized (an 
axiom). 

1.832. VA[Val (A) => P(A)] expresses completeness . (It is rare 
that this is obvious except for a narrow class of formulae.) 

1.833. VA[Val (A) => V (A)] is evident since logical validity 
implies set theoretic validity. The converse is problematic since 
Val (A) demands validity in all structures, not only in set theoretic 
structures. 

1.834. VA[V(A) =* (Ve C o> 2 )T(o>, e } A)] if a> is in the class of 
structures considered (of course, not without the axiom of infinity; 
e.g., if V (A) refers to validity in the class of hereditarily finite 
sets!). 

The basic completeness theorem for the usual rules depends on 
these facts: VA[(Vc C co 2 )T(w, e, A) => P(A)] (as a pure “set” 
theoretic theorem, not using regularity or the power set axiom, but 
using the axiom of infinity), whence, by 1.834 and 1.833, 1.832. 
The problematic converse of 1.833 is not needed . By the evident 
axioms 1.831, 1.833 the purely set theoretic: VA[P(A) => V(A)] 
follows, and may be checked set theoretically. Thus 

1.835. The axioms 1.831 and 1.833 determine Val (A) completely 
to be V (A). 

It is a splendid piece of luck that, for the important class of PL 
formulae, validity in arbitrary structures is reduced to validity in 
« by these evident axioms: a priori it might have depended on the 
existence of measurable ordinals >w! Note that the completeness 
proof is conclusive only if stated for the axiomatically introduced 
Val, and not the defined V. 
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1.84. Axiomatic Theory of Logical Validity (continued ); truth in 
the universe . Another schema of laws for the notion of validity: 
for each fixed A (cf. 1.82) 

1.841. Val (A) => A*, equivalently, by 1.835, V(A) A*, pro¬ 
vided only A* is meaningful. The verification is trickier. 

1.8411. The obvious proof by specialization of V(A) runs: take 
the universe V for a, the membership relation G for e in V aeT(a, 
e, A)! But certainly neither is a set in the sense of the cumulative 
hierarchy. We should need: V G V, that is, self-application. 

1.8412. For finitely axiomatized set theories S (in which their own 
consistency, that is, Val (-rS), cannot be proved), there is an A 
for which 1.841 cannot be proved; take — rS for A, since S* is the 
(interpreted) conjunction of the axioms. 

1.8413. In Zermelo’s set theory (without regularity and without 

the axiom of infinity) we have, for each fixed A, P(A) => A* (and 
hence, with the axiom of infinity, 1.841 by 1.834). The proof uses 
so-called cut free rules; cf. Section 3 and 1.821. _ 

There is a second pro of using the reflection p rincipl e: if A*, there 
is a set a such that 3eT(a, c,- tA), and thus V(A). This is the 
natural proof if A* is intended as an assertion about sets in the 
sense of the cumulative type structure. But if A* is to be about 
arbitrary objects, the argument is inconclusive because then the 
reflection principle is not evident. The matter is somewhat 
scholastic because the meaning of G itself is then not evident (cf. 
footnote 10). 

1.842. The result 1.8413 establishes the adequacy of Zermelo's 
axioms for treating (certain properties such as 1.841 of) the intuitive 
notion of logical validity for the class PL of formula. 

1.9. Remark on Categories Which Present Two Distinct Problems. 
First, a primarily formal question: can a more elegant exposition be 
given by use of primitives other than set and membership, even if 
the new ones are definable? This seems not unlikely since the 
fundamental notion of morphism is construed as a trinity con¬ 
sisting of function, domain, and “possible” range; introducing the 
last is like distinguishing between the empty set of apples a and 
the empty set of pears p, and then reintroducing extensionality by 
regarding them as (0, a), (0, p) respectively. Such a formal dis¬ 
covery might even be relevant to 1.6 if it turns out that axioms 
which are evident for the new primitives are not derivable from the 
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usual set theoretic axioms. This might happen even if the new 
primitives have been defined set theoretically, provided people learn 
to use them without constantly going back to the definition. 
Second, the broad question of self-application, when, loosely, there 
is a category of all categories satisfying a condition P, which itself 
satisfies P: This is nothing new in logic (1.8411), nor in mathe¬ 
matics. Thus the most natural way of thinking of an ordinal is as 
an abstract well-ordered structure (order type), and the natural 
ordering of these abstract structures is again a well-founded struc¬ 
ture 0. That this much makes sense is clear, less so what princi¬ 
ples valid for the cumulation type structure are false for abstract 
structures. Candidates are the axiom of choice for abstract struc¬ 
tures or the isomorphism between each well-ordered structure of 
abstract ordinal a and the predecessors of a in 0, used in the 
Burali-Forte paradox. 

It is amusing to observe a connection between 1.8413 and the 
reduction of the theory of categories to set theory adopted by its 
leading exponents. They use the axiom of universes, that is, the 
reflection principle stated for the (infinite) conjunction of all axioms 
of Zermelo-FrankePs set theory. Now, even without knowledge of 
the details, it is morally certain that the additional axiom is not 
needed. In any particular proof (involving categories) only a finite 
number A F of axioms of set theory are used. One has almost cer¬ 
tainly overlooked the fact that for each such case the reflection 
principle is provable in set theory (by use of axioms other than A p) 
provided regularity is assumed. So the ideas of the reduction can 
be applied to the set in which these axioms hold instead of applying 
them to the universe of all sets. 

This situation illustrates 1.8; so far mathematical practice does 
not force one to consider notions more abstract than those of the 
cumulative type structure. This, by itself, does not support the 
conclusion that therefore such notions are irrelevant to mathematics. 
It is common experience (cf. Section 3) that the first uses of poten¬ 
tially powerful principles make the exposition clearer, but can be 
eliminated; for example, for a long time arithmetic remained con¬ 
structive, although the principle of induction permits noncon¬ 
structive uses; and even to this day analysis is exaggeratedly pred¬ 
icative, that is, uses surprisingly elementary instances of the 
principle of the least upper bdund. 11 There may indeed be a reason 

11 The conclusion may be sound if one wants to use logical theory so to speak: 
commercially, as a means of sanctifying (defects of) current practice. 
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why self-application should be excluded (at least) from (realist) 
mathematics; but if so, this reason is not understood. 

2. INTUITIONISTIC MATHEMATICS 

The reader is certainly familiar with some examples of con¬ 
structive and nonconstructive proofs. Almost all school mathe¬ 
matics is constructive; specifically, it can be codified in such free 
variable systems as primitive recursive arithmetic. The non¬ 
constructive proofs that will come to his mind are probably so-called 
pure existence theorems where it is asserted that (for each x) there 
is a y with some property P, and one does not know how to find such 
a y. In many cases one can actually establish that there is no 
algorithm of a general kind (for example, definable by means of 
recursion equations) for obtaining y. Thus, in the case of the word 
problem for groups, one considers: for each word x there is a y , 
y = 0 and x is the identity or y = 1 and x is not the identity for the 
group considered. In analysis the situation is even simpler if one 
can show that y is not a continuous function of the real variable x , 
since then y cannot be computed approximately from approxi¬ 
mations to x . This kind of nonconstructivity cannot even be 
expressed in free variable formalisms since they do not contain 
existential symbols, but consist of arithmetic identities. Though 
in principle in the proof of an identity nonconstructive steps, in 
fact nonconstructive assertions, may occur, this seems to be avoida¬ 
ble in current practice. It so happens that when originally non¬ 
constructive proofs were made constructive [for example, Little- 
wood’s theorem on changes of sign of x(x) — li(x) or Artin s 
solution of Hilbert’s seventeenth problem], the resulting proofs 
could be formulated in primitive recursive arithmetic. These 
examples are too crude to serve as typical illustrations for understanding 
intuitionistic mathematics . For, the negative cases are so evidently 
nonconstructive that y{x) cannot even be defined in the required 
manner, let alone be proved constructively to satisfy P for each x. 
And the positive cases are so evidently constructive that they can 
be obtained by quite elementary combinatorial considerations, so- 
called finitist proofs (to be considered in Section 3). A distinction 
between finitist and intuitionist f first made in [34], is this: in the 
former, constructions are applied only to (concrete, that is, spatio- 
temporal) configurations, in the latter also to abstract objects (such 
as functions and functionals, and, particularly, “logical” operations 
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on so-called undecided properties and on proofs). Finitist proofs 
cannot be relied upon as examples of intuitionistic mathematics, 
because one might have quite a clear idea of constructions applied to 
concrete objects, and none of the more general constructions: the 
former idea is independent of the latter. Just as the concept of 
finite collection is independent of the general concept of collection. 

The introduction below is intended to lead straight to a general 
theory of constructions. If the reader wants some typical examples 
of nonfinitist constructive proofs, he should look at 2.6 (intuitionistic 
mathematics of inductive definitions and ordinals) first. They are 
incidentally mathematically attractive; cf. 2.6411. For a discussion 
of nonfinitist constructive definitions , see 2.72-2.724; the point is 
important because the laws of intuitionistic logic are not particularly 
evident for finitist constructions. Background actually assumed: 
an occasional glance at some set of formal rules of intuitionistic 
predicate logic and of primitive recursive arithmetic. The early 
chapters of [8] contain this and more. 

The mathematical theory begins in 2.2. 

Intuitionism is a very narrow version of the idealist conception 
of mathematics, not unlike solipsism within general idealist philoso^ 
phy. It seems to be the only version that has been developed. 12 
As is to be expected from the solipsist tradition, the criticisms of 
rival positions are extravagant and unconvincing—and possibly 
merely intended to cover up the real difficulties in formulating a 
solipsist position coherently. But it is to be remarked that this 
position is quite plausible, at least as a first approximation, when 
one is interested in questions of evidence (or in the different question 
of intelligibility). The solipsist position stresses the particular evi¬ 
dence of those ideas which are themselves about other ideas or, more 
particularly, about mental acts, and not about external objects. 
The question is to what extent this can be developed. (Cf. Sec¬ 
tion 5 concerning the compatibility of solipsist and realist positions; 
of course, not in the strict sense of page 98.) There are still diffi- 


12 There are, of course, material difficulties in stepping from the idea of a thing 
to the thing itself. But are there significant formal (mathematical) differences 
between them? A possible case arises in connection with the cumulative type 
structure which, on the theory presented in Section 1, is not a well-defined 
thing (collection), and therefore unrestricted quantification over all sets was 
excluded in 1.1 (a hidden parameter «o being required). If the idea of the whole 
hierarchy is regarded as well defined, unrestricted quantification is well defined. 
But here the reflection principle nullifies the effect! 
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culties in a really coherent formulation, but there is no doubt of the 
intrinsic interest of the work done in this connection. This is 
presented below. 

2.1. General Plan . Suggested by the experience in foundational 
research described in Section 1, the principal stages in the general 
plan are these: 

Formulation and (partial) axiomatization of various branches of 
informal constructive mathematics; (i) abstract constructions and 
proofs (2.21 and 2.22 below); (ii) (constructive) functions of finite 
and higher types (2.4); (iii) free choice sequences of natural numbers 
and other objects (2.5, 2.62); (iv) ordinals introduced by so-called 
generalized inductive definitions (2.61); and finally (v) the logical 
operations themselves and (undecided) properties (called species to 
distinguish them from decidable properties) defined by them. 

Relations between these branches in three distinct senses: (i) 
outright definition of one concept in terms of others, familiar, for 
example, from the conversion of implicit definitions into explicit 
ones; (ii) contextual definitions (for example, 1.31 or 1.32); and 
(iii) proof theoretic reductions or models (for example, 1.52). The 
most important examples of (ii) concern free choice sequences and 
generalized inductive definitions, and of (iii) concern reductions of 
higher type theories to arithmetic or analysis. 

Finally, isolation of primitive concepts, in terms of which the 
others can be defined, and laws (axioms) for these primitives. Cur¬ 
rent candidates are construction (function) and the application 
operation with proof as a suppressed parameter. 13 It will be seen in 
2.2 that these primitives are sufficient to express requirements, just 
as Peano’s axioms formulated in the language of sets and relations, 
express what properties a set N and a relation d N must have to 
serve as the structure consisting of the natural numbers and suc¬ 
cessor relation. But, in contrast to Section 1, we do not yet know 
simple axioms, valid for these primitives, whereby objects can be 
defined to satisfy the requirements. This is the principal problem 
(cf. 2.8). 

A summary of proof theoretic results is given in 2.63, which 
exhibit remarkable interreducibility between the various branches, 
as developed so far. A more general way of looking at these results 

la As ordinal and order of the cumulative type theory are suppressed in the 
practice of set theory. The occurrence of such hidden parameters seems essen¬ 
tial in work that aims to give an analysis of informal mathematics. 
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in terms of the distinction between abstract and mechanical con¬ 
structions is given in the discussion of Church’s thesis in 2.7. 

The following general observations concern common misunder¬ 
standings of the intuitionistic position. 

2.11. Logical Operations . In contrast to Section 1, their meaning 
(particularly of quantification and implication) is genuinely prob¬ 
lematic for the present (solipsist) approach. Thus, we make asser¬ 
tions about all numbers; something here and now is supposed to 
constitute evidence for this. Whatever else is in doubt, mathe¬ 
matical practice makes it clear that this something is (a compre¬ 
hended) proof\ So proof must enter into the basic explanation. 

2.12. Proofs . For the present position, a so-called formal proof 
(sequence of symbols constructed according to a mechanical pattern) 
is a representation or description of a proof, and not the proof itself; 
just as a symbolic description of a set, for example, {0, co}, is not 
generally the set described. 

2.121. In particular, formal rules do not express the intended 
meaning of correct inference, but are used only because they have 
been recognized to be valid (cf. 1.832). It should be noted that 
currently formal rules (of inference) are often given in the first chap¬ 
ter of a mathematical text, for example, Bourbaki, for the sake of 
precision; but this is demanded by a particular philosophical con¬ 
cept of precision (cf. Section 3) and not by mathematical practice; 
if it were, the later chapters of the text would have to refer back 
to the rules, and they do not. 

2.122. To formulate (partial) laws about proofs we shall use 
formal systems, as in Section 1. It may so happen that the proofs 
represented by the formal derivations in the systems themselves 
also satisfy the laws. [Just as, for some set theoretic laws, the sets 
explicitly definable (on the basis of these laws) constitute a model.] 
But this is not necessarily so, for example, 2.61. 

2.123. It so happens that in set theory the equality relation 
between sets (and the axiom of extensionality which is relevant) is 
quite basic. In intuitionistic mathematics this is less so, and often 
so-called intensional equality is more important. In particular, for 
proofs, equality is seldom considered and therefore one often speaks 
of provability. 14 

14 But there may be an interesting theory: mathematicians interested in priority 
questions often discuss, in apparently objective terms, whether or not two proofs 
are the same. 
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2.13. Proofs of Whatt If the logical operations, in terms of 
which the usual assertions are built up, are not primitive but 
explained, then the basic proofs must be proofs of special assertions 
in which the (problematic) logical operations are not involved. One 
such special kind is familiar from quantifier-free mathematics, for 
example, quantifier-free arithmetic, which consists of assertions 
A(n) for decidable numerical properties A(n). In intuitionistic 
mathematics this is generalized. Instead of numbers, we have 
arbitrary mathematical objects, that is, (ideas of) concrete objects 
and constructions; instead of numerical properties, we have notions, 
that is, understood, decidable properties of mathematical objects. 

A notion is denoted by a symbol v, and v(a) = 0 if the object a has 
v, u(o) = 1 otherwise; for example, the property of being a finite 
sequence of 1 is a notion. Since notions are decidable, the usual 
truth functional operations may be applied to equations v{a) = 0, 
u(a) = 1. 

2.131. Type Structure. [(0) is a finite type, and if <r, r are finite 
types, so is a 7 .] For each notion v we introduce the finite type 
structure on v as follows: u<o)(®) = v(a). Suppose iv, v r are already 
introduced, and »,(<*') = 0, v T (a T ) = 0 for some a", ar. Let £ stand 
for <r T . For any given object x, associate x (£) by the definitions: 

If Vr (a) ^ 0, * ( 0(o) = a" 

u,(o) = 0 • u,[x(a)] = 0 =* x (£) («) = x(a) 

v T (a) = 0 • v.[x{a)] v* 0 =>x (£, (a) = o'. 

Then x (t) is of type <j t by definition, and we have a well-defined 
notion v( of objects which are of type ( by definition. For, if x is 
any clearly conceived object, it is either conceived to be introduced 
in this fashion or not! It is possible, of course, that v { (*) * 0, 
but that we have a proof asserting: u T (o) = 0 => u»[x(o)] = 0. 

2.132. Extensions and Contractions of Functions. Given a notion 
v and an object o 0 : v(a 0 ) = 0, and an operation b defined on u. Then 
b can be extended to an everywhere defined operation b* with values 
in u by the stipulation: b*(a) = b(a) if v(a) = 0, b*(a) = ao if 
u(o) = 1, where “if” is applied to a decidable condition. Con¬ 
versely, if b is everywhere defined, we contract it to b°:b°(o) = b(o) 
if u[b(o)] = 0, b°(a) = a 0 otherwise; then b° is u-valued by definition. 
Naturally, the explanations above (of being of type v T by definition) 
will not be reproduced within the formal development, but embodied 
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in the formation rules and axioms to be given. In this case, the 
use of truth functions applied to equations between terms of 
so-called higher type [34]. 

2.14. A Fundamental Principle Implicit in 2AS. The meaning of 
p is a proof of o(a) = 0 for all a, is understood for notions v; in other 
words, we recognize a proof of an assertion of this form when we 
see one. Writing *\p; z • v(x)] (“z-” to denote binding of “z”) 
for this relation, we have 

2.141. If v is a notion, so is 7r[p; x • u(#)] where there may be other 
variables in v(x) y and x may be a sequence, not only a single variable. 

This principle is embodied in the usual formal systems where, for 
any particular (representation of a proof by a) sequence of sym¬ 
bols, it can be decided whether it proves (the assertion expressed 
by) any given formula. In addition, formal systems require 
the decision (i) to be mechanical , and (ii) for arbitrary formulae 
(not only universal ones). Below we shall derive (ii) by means of 
2.141 from our interpretation of logical constants; we do not assume 
(i), although most known (partial) laws about proofs can be shown 
to be compatible with (i). 16 

2.15. The Meaning of the Primitive Concepts . It is not to be 
expected that we have a clear idea of the extension of the concept of 
mathematical object; 2.132 shows that doubts about the extension 
need no affect particular conclusions. 16 But the application opera¬ 
tion requires a careful interpretation. 

2.151. Total Functions . By 2.13 we want our functions to be 
total, that is, all terms to denote some object, since otherwise equal¬ 
ity ( t = s) would not be decided. Then one would have the prob¬ 
lem of discovering the logical laws for partially defined predicates, 
which (by 2.13) we wish to avoid. Now, the ordinary meaning of 
application a(b) has to be extended; for example, if a is conceived 
as a function of two arguments and 6 as a single object, and not a 


18 This remarkable discovery, including various completeness results, con¬ 
stitutes much of the support of the mechanistic (formalist) position; cf. Section 
3. And this has since been used in the sense of footnote 11. 

18 The difficulty is rather to find strong axioms for the primitive ideas. The 
situation is analogous to the state of set theory before the meaning of set in the 
sense of the cumulative type structure was clearly recognized. Such funda¬ 
mental principles as the power set construction or the axiom of choice were then 
genuinely problematic; doubts about the extension of the concept of set were less 
important, cf. footnote 12. 
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pair, no meaning is assigned to a( 6 ). This can be met by construing 
every function as having only one argument [ 2 ]; but, generally, 
ambiguities are avoided by the following convention: if for the 
objects a, b as given or conceived , no sense is assigned to a(b), then 
a(b) is put = a, say. (Cf. in type theory: if no sense is assigned to 
a £ b } we regard a £ b as false.) Obviously, for any proposed 
axiomatic scheme one has to verify its validity for this convention. 
This is illustrated by considering 

2.152. The \-Calculus. The naive proposal (parallel to the 
principle: every property defines a collection) is this. For every 
term t[x], built up by means of the application operation from con¬ 
stants and containing the variable z, there is a function X zt[z] for 
which we have a proof a*:#* (Xz£[z])(x) = t[x]. This is excluded 
by the 'paradoxes . There is a notion t;, rj(x) = 0 if x ^ 0, rj(x) — 1 
if x — 0 ; it is a notion since any clearly conceived object either is 
conceived as 0 or not. Consider the term ti(x(x)); though, by the 
convention of 2.151 it is well defined for each x, is there a clearly 
conceived object c (c: for Church) with c(x) = q(x(x))? No, since 
c(c) and 17 (c(c)) are different. In short, the existential assumptions 
implicit in unrestricted X-term formation and conversion are not 
correct. The “rule” c: for each x, take the value i?(x(x)) overlooks 
the tacit convention that, for x = c, the value is also c(c). 

2.153. The two abstract theories in 2.2 below should be compared 
with two kinds of set theoretic formalisms. In the first, there are 
no X-terms, corresponding to set theories where there are no com¬ 
prehension principles at all, but only class formation operations 
(intersection, domain, complement, and so on); similarly, here we 
only have some basic notions, and combinators , permitting the 
formation of sequences, substitution operations and the like. In 
the second, proofs and functions are separated, though it is not 
excluded that proofs are also functions. Here X-terms are used for 
introducing functions but with a restriction on types; only quite 
elementary existential assertions about proofs are made, essentially 
this: if, for a function constant 4>, <t>(a) = 0 with free variable a has 
been derived by formal rules, one asserts that the very thought a+ 
described by this formal derivation satisfies tc[ a#] x * <£(x)]. 

2.2. Two Theories . The first is implicit in [39]; the second is 
adapted from [34]. 

2.21. Lower-case letters, a, 5, . . . (x, y } . . •) are (dummy) 
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variables for constructions; and constants if they have subscripts, or 
0, 1; Greek lower-case letters are predicate constants (with pos¬ 
sibly several arguments) for notions; truth functions •, ’ (and =* as 
abbreviation); = for (intensional) equality; the main functor is 
application a(b ); some functors for forming ordered pairs D, D(a, b ) 
being written: (a, b), and inverses Di, D 2 , Di(a) sometimes written 
as a 1 , D 2 (a) as a 2 ; more elegantly, one can use combinators (in the 
sense of [3]). 

2 .211. Terms: individual constants and variables; if t and s are 
terms, so are <(s), (s, t), s 1 , s 2 . Usual notion of free variable. 

2.212. Formulae: For terms t and s, t = s (equations); if p is a 
predicate symbol (with n arguments) and t u . . . t n are terms, 
p(h, . . . ,t n ) is a formula; if a and p are formulae, so are a and 
a ■ P. Finally, if a is a formula which does not contain a; as a 
dummy variable and the term t does not contain the variable x, 
ir(t; x ■ a\x)) is a formula; and a; is a dummy (bound) variable. 
Here ct[x] denotes a itself, and a[(\ below will denote the formula 
obtained by substituting t for a: in a. ir(a; xy • a[x, y]) stands for 

• v{a\x)\y ■ a[x,y)\). 

2.213. Axioms: Equality axioms, 

(a, b) = (c, d) =» (a = c) • (b = d), 

Di({a, b)) = a, D 2 ((a, b)) = b, all propositional identities. 

*■(«; x • «M) => a[<] for all terms t (reflection principle) and if 
& has been formally derived and the variable x does not occur 

in a 

a =» w(a p ; x ■ /3[a;]). 

The subscript p can be taken to be, for example, the formal deriva¬ 
tion itself. Free variables are suppressed, a p depending on the 
variables occurring in a => /3 other than x. 

2.214. Rules of Inference. Substitution and modus ponens. 
(The reader should verify, for example, the construction of a con¬ 
stant a t : ir(a; xy ■ a[x, y]) => 7r(a t (a); yx ■ a[x, y ]), by use of the two 
principal axioms.) 

2.215. Below we shall mention, but not use, the reducibUity 
hypothesis: 

If a notion p(o) has a constructive characteristic function, then 
so has irfa; x ■ p(x)] (and similarly with more variables). 

More formally; let ot(c, x) denote the notion of application: 
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C (x) = 0. We introduce a combinator C (whose arguments and 
values are constructions) with the axiom: 

v[a) x • a{Cj a;)] <~> ( Cc)(a ) = 0. 

For important consequences, see 2.6142. If in some sense, con¬ 
structive functions can be 5^0 only for some a priori limited kind 
of argument, the reducibility hypothesis means this: there is some 
a priori limit on the kind of construction which can be used as a 
proof about any given constructive function. 

This completes the description of our first system. 

2.22. Lower-case letters with type superscripts a T , b a , . . . are 
variables for functions of type r, <r, . . . ; and constants if they have 
subscripts; German lower-case letters for proofs; a single relation 
symbol = between terms; and t( a;a T b a . . . • a[a r , &*, . . .]) for 
a which are propositional combinations of equations. Application 
£ r (s T ) if o- is of the form p r .—X-symbols. 

2.221. Terms are built up in the usual way subject to type 
restriction. 

2.222. Formulae: Propositional combinations of equations, or 
**(a; a T b 9 . . . ■ a[a T , b°, . . .]). 

2.223. Axioms: Equality axioms, propositional identities, and, 
for each X-term, the schema of X-conversion. (X-terms can be 
replaced by suitable combinators, just as in classical set theory the 
comprehension schema can be replaced by use of a finite number of 
class operators.) 

2.224. Rule of inference: Substitution. 

Also, if a[a r , b* y . . .] has been formally proved for free variables 
a-, b% . . . 

9 r(a p ; a T b* ... * a[a T , b a , ...]). 

2.23. The axioms given above are the basic axioms (for the 
foundation of logic); for example, if type 0 variables in the second 
theory are intended to be number variables, the principle of defini¬ 
tion by recursion is added by introducing a successor constantand, 
for given constants b\ ci, a constant a\ with the axioms: 

a x (0) = bi ai(a') = (ci(a))(ai(a)) 

where bi is of type r, a\ of type r°, therefore a of type 0, and Ci of 
type (r T )°; cf. [34]. 
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2.3. Two Meanings of Logical Operations; Validity of Formal Rules 
of Predicate Logic (hence: reduction of logical primitives to the 
primitives of 2.2). 

Formulae in the notation of predicate logic are considered, built 
up from predicate symbols P? with n arguments.—N.B. Though 
the meanings are different in the two cases we use the same symbols: 
~ (not), & (and), v (or), 3 (implies), ( x ) (for all x), (Ex) (there is 
anx). 

2.31, Throughout we show at most one free variable. Each 
predicate symbol is assumed to be interpreted , that is, it is under¬ 
stood what it means that a is a proof that x satisfies P*, that is, 
there is given a notion p z *(a, x). By induction one builds up the 
notions expressing: a is a proof of a (complicated) formula A(x), 
denoted by n[a; 4 (a;)]: 

H[a; Pi(z)] is pi(a, x ); n(a; A & B) is n(a 1 ; A) • n(a 2 ; B ), 

B(a; A v B) is U(a l ; A) • n(a 2 ;P); n[a; (Ex)A(x)] is U[a u , 4(a 2 )]; 

n[a; (x)A(x)] is ^Ca 1 ; x • n[a 2 (:c); .4(a)]); 

n[a; A 3 P] is ir[a l )x * U(x; A) =^n(a 2 (:r); B)] m 

Evidently, for each well-formed expression A (x) of predicate logic, 
II[a; A(x)] is a formula of 2.21 built up from ir and p*. We have 

2.311. A formula A of predicate logic is formally derivable by 
means of the usual rules , if and only if for some constant a a of 2.21, 
II (aA ; A) is a theorem of 2.21. 

2.312. In contrast to 1.831 and 1.832, 2.311 is stated as a proof 
theoretic equivalence, and not as an implication. The latter case 
will be considered in 2.74 (completeness problem). 

2.32. For the second meaning, each predicate symbol is assumed 
to be interpreted in the form: it is understood what it means that 
(a, a T ) is a proof that x satisfies P 4 -, here given by a notion pi(x , a r , b c ) 
(a and b are sequences), such that ir[a ; b* • pAx , a T , &*)]. If P # - is 
decidable (a notion), p t - is independent of a and b . 

We drop type superscripts; the understanding is that the types 
used are coherent. Though a variable a for proofs is shown, for 
easy comparison with 2.31, note that no constructions on proofs are 
assumed except for trivial pair formation and specialization. 
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We write r(a; a\ A); each such r will have the form: 

ir(a; b • a[a t 6]) (using 7r(a; b * a[6]) * ^(a'; c ■ 0[c]) 

***[<a, a'); 6c •<*(&)-0(c)] 

r[a; a; Pi{x)] is r[a; 6 • p*(x, a, &)], 

r(a; a; A & B) is T(a l ; a 1 ; A) • r(a 2 ; a 2 ; 5), 

r(a; a t AvB) is 7r(a; 6c • {(a 1 = 0 => <x[a 2 , 6]) ’(a^O^ 0[a 2 , c])}) 
where 

r(a; a ; B) is ir(a; c * j3[a, c]); 
rfa; a; (£'x T )^-(^)] is r[a; a 1 ; A(a 2 )]; 

T[a; a; is ^[a 1 ; x T * 7r(a 2 (x); 6 * a[a(x), 6])] 

Finally, r(a; d; A D B) is 7r[a; a6 • (a[a, <P(a, 6)] =» /3[d 2 (a), 6])]. 

2.321. // a formula A of predicate logic is formally derivable by 

means of the usual (intuitionistic) rules, then there is a proof constant 
a A > and a function constant a A such that r(a^; a A ; 4) 2.22. If A 

is derivable in arithmetic then there is a constant a a of the system 
extended by primitive recursion 2.23. 

2.322. The converse of 2.321 does not hold since for A : {(#)[P(x) v 
~P(z)] & ~~(Ex)P(x)} D (Ex)P(x), we have V(a A ; a A ; A ), but 
A is not formally derivable. In other words, the usual rules are 
not complete for the meaning given to the logical operations in 2.32. 
This is not surprising: if Pi{x) is itself decidable, that is, pi(x , a, 6) 
is independent of a and 6, the interpretation of ~~(Ex)P(x) is 
just that of ( Ex)P{x ). 

2.33. Interpreting the Interpretations of the Logical Connectives. 
Evidently, the trivial axioms of 2.21, 2.22 leave wide latitude to 
what one is to mean by proof or constructive functions; obviously 
one cannot expect that the elementary practice of logic or arithmetic 
determine such general ideas. The examples below give some idea 
of alternative interpretations. (The conceptual defect of ambiguity 
will be turned into proof theoretic reductions.) 

2.331. Alternatives for the axioms 2.22 are these: (i) constructive 
functions of constructive functions, etc., as in [34], (ii) additional 
continuity requirements (for the topology of [46]), and (iii) the 
modification of (ii) in [48] p. 154. 

2.332. Which of these alternatives is adopted may affect the 
relation between 2.31 and 2.32. Here are some examples for arith¬ 
metic A : 
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2.3321. For prenex A y the two meanings are equivalent, for 
negations of such A, 2.32 is stronger than 2.31 provided the same 
functionals are considered. So here the relation is clear. 

2.3322. Let A~~ be the negative version of A [where (Ex) is 

replaced by A v B by ~(^A & ~B)] and A the usual 

classical prenex form of ~A. Then for continuous functionals 
[2.331 (ii) or (iii)] of type 0°, ^^A, A~~ y are all equivalent on 
2.32, but not on 2.31. There we only have ~~A 2) A~~ 2) ~A 
(for prenex A ). 

Which of these alternatives is intended will come up again in 
2.5 (page 134) and above all in 4.4. 

2.333. For purely formal proof theoretic results in Section 4, note 

2.3331. If in 2.32, arbitrary (nonconstructive) functionals are 
used instead of those of 2.2, the logical connectives satisfy the laws 
of classical logic instead of 2.321. 

2.3332. If A is a formula of finite type (not necessarily arithmetic), 
and A i the interpretation by 2.32 of A”, then A <=> A x can be 
proved by classical logic of finite order (without comprehension 
axiom) by use of the axiom of choice: Va r 36*C(a, b) =* 3 c*Va r C[a f 
c(a)] restricted to quite elementary quantifier-free C only [46, p. 120]. 

2.34. Axiom of Choice . The axiom of choice (x)(Ey)A(x, y) 2) 
(Ez)(x)A[x y z(x)] holds on both interpretations for either unre¬ 
stricted quantifiers x, y , or for x, y restricted to decidable ranges. 
For 2.32, and decidable A, 

(x)[(z)A(x, z) D (Ey)B(y)] D (x)(Ey)[(z)A(x t z) 2) B(y)] f 

but not generally on 2.31. Generally, for unrestricted A(x): 
(x)[A(x) D (Ey){A(y) &f?(y)}] asserts on 2.31 the existence of a 
function of two variables/(x, a): 

U[a; A(x)] =*n(/ n (x, a); A[f l \x , a)]) *n(/ 21 (x, a); B[f 2 (z, a)]). 

An “object” in the range A(x) is a pair, namely an object x and a 
proof a that x satisfies A(x). —N.B. Even if y is known to be 
unique , one cannot generally prove by known principles that the 
quantifier can be brought forward (3.3221). 17 

2.35. Definitional and Extensional Equality. For given terms t 
and s (even constants), t = s may be formally undecided; for, 

17 Let EvA(x , v) be nonrecursive, A primitive recursive; so that (x) {(Ev)A(x f v ) 
3 (B]y)(z)[A (x, y) & [z < y 3 / ^ j A(x f z )}]}; if y could be brought forward 
every recursively enumerable set would have a constructive characteristic 
function. 
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in an interpretation one may have in mind more than just the rules 
of conversion explicitly set out in the axioms. One interpretation 
of 2.22 is that the operations denoted are simply those fully described 
by the conversion rules themselves (in terms of application and sub¬ 
stitution). Though abstract thought is needed to recognize the 
fact that they give a full description, the equality relation (inter¬ 
convertibility of terms) may be expected to be mechanical, that is, 
recursive. For the theory 2.22, this is verified in [67] and ana¬ 
lyzed. 18 It should be noted that not all clearly understood rules 
(even for purely number theoretic functions) can be described in 
terms of application and substitution: consider a formal language 
and a meaning; if a sequence of symbols denotes a constructive 
proof a of an existential assertion ( Ex)A{x ), take the object a 1 of 
that proof; otherwise take 0. Proof theory for intuitionistic formal 
systems often reduces this to a recursive rule. 

In 2.21, extensional equality a m b means: (x)[a(x) = b(x)]. If 
a, b are not given as universally defined constructions, for our par¬ 
ticular convention in 2.151 we then have a = b only if a = b. 

In 2.22, this is a weaker sense; for type 0 objects, a m b means: 
a = b. The stronger sense is defined by induction with respect to 
the finite type structure. (x)[a(x) = b(x)] and (x){y)[x = ?/ 3 
{a(x) s a(y) & b{x) ■ &(j/)}]- 

2.4. Functions of Higher Type or Logic {Proofs)? (Cf. 2.131.) All 
logical operations are to be regarded as either primitive or in the 
sense 2.31 (not 2.32). The present section (2.4) contains mainly 
proof theoretic interreducibility results. 

2.41. To give an idea of the facts we need some definitions. 

2.411. T is the quantifier free system (of primitive recursive 
arithmetic) of finite types of 2.321, that is, of [34], with intuitionistic 
propositional logic, that is, not t — s v = s for all terms of 
higher type, T d {D : for definitional), with truth functions applied to 
all equations, T B {E: extensional) with the rule, from P 3 t[x] = s[x] 


18 So to speak: a well-defined minimal model of 2.22. It has always been 
evident that, except in trivial cases, no set of axioms can be expected to deter¬ 
mine a unique interpretation of the kind above, unless supplied by some tacit 
convention like minimality. More surprisingly the situation has turned out 
to be completely parallel for general models in classical logic (relativity and 
incompleteness theorems). Except for details, the role of formal systems is 
much the same in Sections 1 and 2. 
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(only equations between terms of lowest type) infer P D u(Xxt[x]) = 
u(Xxs[x\). 

2.412. For a set fr] of types, H [t] is T restricted to types in [r], 
with quantifiers, intuitionistic logic, induction applied to all for¬ 
mulae, and the axiom of choice applied to quantifier free A: if {, 77, 
K = i/ f ) are in [r]> { )(^M(a, ft) D (Ect)(a ( )A[a, c(o)]. H is first 
order arithmetic, Hi has also variables for number theoretic func¬ 
tions with a finite number of arguments. These are the basic sys¬ 
tems, more axioms will be added. 

2.42. By [34], T is much stronger than primitive recursive 
arithmetic (to against 0 ^). In the presence of logic the situation 
is quite different: finite types are absorbed. 

2.421. Hu with the unrestricted axiom of choice (all A) is reduced 
to T, literally as in [34]. Conversely, 

2.422. T, T Ef T d are interpreted in H • T D by [67]; for T E} 
use 2.331 (ii), that is, use first H h define in H x neighborhood func¬ 
tions for the topology [46], by conditions C r E (a) y interpret (a T ) as 
(a)[C T (a) D], and verify the axioms of T E . For reference below: 
by modifying C r E [2.333 (iii)], one gets an interpretation for T with 
weak extensionality of 2.35, and in addition a modulus of continuity 
functional for operations on sequences. Then H x is interpreted in 
H by letting a range over the recursive functions, the basic axiom 
of choice being satisfied. 

2.423. For extensions to transfinite types, cf. [67]. The results 
have the same character for suitable types other than «. Since the 
existential axioms of [67] require an ordinal a to be given the only 
self-contained scheme provided by [67] for introducing functions 
is this: a type a is introduced in the quantifier free theory if a has 
been proved to be a well ordering (in the obvious sense). We get to 
all ordinals < T (3.51); by 3.531, the same is reached if one considers 
the (intuitionistic) ramified hierarchy of logically definable prop¬ 
erties (species) of natural numbers, and iterates it a times if a has 
been proved to be a well ordering in an earlier system (cf. 2.153). 
Note that the reduction of 2.422 cannot be applied since, for 
transfinite r, C E is not definable in H x . 

2.43. Two comments on the axiom of choice. First its proof 
theoretic strength in intuitionistic logic is much weaker than in 
classical logic; for instance, the addition of the law of the excluded 
middle to H x and the unrestricted axiom of choice at the lowest level 
gives full classical analysis, in contrast to 2.421. Second, while T E 
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is satisfied by the functionals described in 2.422, J7 W with the 
unrestricted axiom of choice is not: apply it to (a 2 )(a 1 )(2?n)(6 1 ) 
[(m < ri)(b l m = a}m) 3 a 2 (a x ) = ^(A 1 )]. This is quite consistent 
with 2.34 since the modulus of continuity n depends not only on a 
neighborhood function a (2) of a 2 but on the proof of (? 2 (a (2) ). 

2.5. Free Choice Sequences; objects (to study) or figures of speech 
(to get rid of) ? The obvious source is this: most constructive opera¬ 
tions on functions, for example, decimal expansions, are continuous , 
using only finite sequences of values (not the rules for calculation) of 
their arguments. So, the lattter may be given by approximations 
that are thought of as freely chosen. One probably does not attach 
any immediate sense to logically complicated expressions with 
quantifiers over free choice sequences (denoted by Greek lower-case 
letters); cf. points of infinity in geometry, used primarily as short¬ 
hand for talking about parallel lines. If one could find a clearly 
conceived idea, that is, a mathematical object, “behind” this figure 
of speech, one would have a new primitive idea and possibly new 
axioms (about unbiased coin tossing, or impenetrable fate in 
Brouwer’s later writings). These axioms have to isolate properties 
of the notion: a is a proof of (a) A (a), (Ea)A(a) (for complicated 
A). 19 The implicit assumption 20 is that these properties will be 
coherent with the interpretation of the logical operations to which 
we are “committed” by 2.31. 

For figures of speech the commitment is absolute: there are no 
free choice objects and so nothing new is involved in proofs and 
the functions that are part of the proofs. For example, suppose, 
for some purely arithmetic A, ^(n)[A(n) v ^A(n)]. Then 
— (Ea)(n){[A(n) & a(n) = 0] v [~A(n) & a(n) = 1]}. This may 
seem to contradict the freedom left open to a (if one has uncon¬ 
sciously stepped into a realist or possibly formalist interpretation), 
but does not: for a proof of (Ea) (n) {. . .} would provide a proof of 
(ft) {. . .}and further, some particular a that decides between A(n) 
and ~A(n). 

For free choice sequences as objects, there is a fundamental 
problem: are they to be involved in proofs? Or are only (clearly 
defined) constructions on such objects to occur in proofs? In the 

19 It may be debated whether (i) understanding the meaning of (a) A (a) simply 
means (ii) understanding this relation; all we need is that (i) implies (ii). 

20 As in the whole section; clearly conceived ideas are coherent! 
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former case, a is a proof of ~ A , means that it is a proof of nOz, A), 
where (on the most liberal interpretation) x may involve free 
choice sequences even if A does not. Thus, if the variables in 
2.31 denote objects which themselves involve free choice sequences, 
the “commitment” is pretty loose! 

An analysis of this second conception, in effect, if not intended, is 
in the monograph [8]. 21 

It verifies completely what one expects: on the basis of really 
evident axioms for free choice sequences one does have strong con¬ 
tinuity properties, but ~A can be asserted essentially only if A 
is false for arbitrary functions (modulo the continuity properties); 
cf. 2.6. A parallel situation arises with the interpretation 2.32 if 
continuity requirements are imposed on the functionals used in inter¬ 
preting the connectives, cf. 2.331 (ii). 

The axioms below are clear; that is, our idea of free choice 
sequence may be so vague that it leaves most questions undecided, 
but not these. But they turn out to be reducible to arithmetic— 
though, (from the wrong point of view) they look inconsistent; cf. 
1.643. In 2.5, for studying the main axiom (misnamed: bar 
induction) we seriously regard free choice sequences as figures of 
speech. As is to be expected when something is “analyzed away/* 
much sharper proof theoretic consequences are obtained than in [8], 

Also, and this is probably more important, if the main axiom is 
accepted, the elimination of free choice sequences (of constructive 
objects) is a consequence . The status of other kinds of freely chosen 
object, that is, not only freely chosen sequences of constructive 
objects, is probably the principal open question of the subject; cf. 
4.31. 

2.51. (Cf. 2.412). H ^ r ] is obtained from H[ r ] by extending the 
language and axiom schemata by adding variables a* (for f°G W); 
they are treated as f° objects in the formation rules (a*)(rc)(I£a { ) 
[a*(n) = a*]. But if constructions are not to involve free choices, 
there are no constructions with a* objects as values. What one 
can do is to take operations: £° —* r? (with £ fr]) as new basic types, 
and extend the schemata of T to these as in 2.6244. Here we con- 

81 Recursive realizability [8], In contrast to 2.3, this analysis takes all logical 
operations as primitive except (Ex) (A v B is replaceable by (Ex)[(x = 0 D 
A) & (x 9 * 0 D #)]) which occurs only in the special context: {e} (x) is defined. 
(x) is applied to formulae of complicated structure. Nevertheless, it is some¬ 
times proof theoretically very useful, for example, 4.121. 
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sider [ a] = 1 only (sequences of natural numbers), and start with 
H l , obtained from H by adding, for all A , the axiom of choice: 
( n)(Ea)A(n , a) Z) ( Ea x ){n)Ai{n y a{); here one uses the following 
convention: let <n } m> be a numbering of pairs of natural num¬ 
bers, and 4i(n, <*i) obtained from A(n, a) by replacing each a(t) by 
a(<n, t>), in short, if A(n , a ) is A[n , Xara(x)], Ai(n, oq) expresses 
A[n, \zai(<n, x >)]. (#£j is analogous, but 2.412 is restricted to 

A which do not contain free Greek lower-case variables.) 

2.511. Continuity . Let C r ('y)[C r 0 ('Y)] express that 7 is a (freely 

chosen) neighborhood function into the space of sequences a of 
natural numbers [respectively, into space of natural numbers], that 
is, if am denotes (a(0), . . . , a(m — 1 )), 50 = ( ), German letters 

denote finite sequences of natural numbers, then 

(n)(a)(Em)[y((n, am)) * 0 ] & (n)(m) 0 )) {[y((n } »)) ^ 0 & *> C tn] 
D y((n, P» = 7 (<«, m))} 

[with the first variable n missing in Co(t)]* Let T(n, a) be the func¬ 
tional defined by 7 . For all A 

(a)(Ep)A(a, j8) Z> (Ey)[C(y) & (a)A 2 (a, 7 )] 

where A 2 (a, 7 ) expresses A[a } Xnr(n, a)]. 

2.512. H l with 2.511 is reduced to H by 2.321 and 2.422 with 
modified Cy®. One has extensionality (x)[a(x) = P(x)] Z> [A (a) Z) 
A(i 3)] since by the formation rules a, enter only through their 
values. 

2.52. To state more significant facts, we consider H x 1 and the 
following basic axiom: 

2.521. S(a) [read: a is a constructive spread] means that a defines 

a set of permitted initial segments, with a ranging over finite 
sequences of numbers and * denoting concatenation: (a) (Ex)[a(a) = 0 
D a(a*x) = 0], (a)(a' C o)[a(a) = 0 D a(a') = 0], and a G a: 

(x)[a{3x) = 0]. Then, for all A{a) containing possibly bound, but 
no free Greek lower-case letters, 

A(a) D C Ea)[S(a ) 4a£a4tfG a)A(0)] 

(the converse being logically valid). For, to be given a is to be 
given a spread. It follows 

2.522. (a){[S(a) & (a G a)A(a)] D [(a G a)B(a)]\ D [A(a) D 
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B(a)] (the converse is trivial), and (Ea)A(a) D ( Ea)A(a ) if there 
are no free Greek letters. Then (for negative as in 2.3322) 

2.523. First Elimination Results . If A is a negative formula , 
containing bound , but no free Greek variables , there is A f not containing 
any such variables , such that A <-> A', in Hi 1 with 2.521 

(2.511 is of course not needed). Proof by induction on number 
of (universal) Greek quantifiers. Hence 

2.524. For quantifier free A , (a) ~(x)~ A(ax) (a)[5(a) D 

~(6 £fl)^ A (bx)]. 

2.525. Trivially, the extension of H 1 + 2.511 by 2.521 is con¬ 
servative by simply treating a (constructive functions) as a. 

2.53. The last elementary axiom expresses that the whole extension 
of a constructive function (rule) depends only on a finite number of 
values of an a , in contrast to 2.511. For A(a , a) not containing 
free Greek letters other than a : 

2.531. (a)(Ea)A(a y a) D ( Eb)(Ec)[Co(c ) & (a)A 3 (a, b, c)] where 
Az(a , b y c) expresses that the function a = \nb((n f V(a))) satisfies 
A (a, a). In other words (a) (Eft) requires a continuous mapping 
from N n to N N where domain and range have the product topology, 
(a)(Ea) one where the domain has the product topology and the 
range the discrete topology. 

The trivial method of 2.525 cannot be used to reduce H i 1 + 
2.511 + 2.521 + 2.531 to arithmetic because 2.531 and 2.511 con¬ 
flict. One uses again the method of 2.321. 

To extend 2.523 to all (closed) formulae A , one has to analyze 
more closely statements of the form (a)(E0)A{ay /?). By 2.511 this 
is thrown back on (a) (Ex) A (a, x) [in the definition of C(y)] actually 
even for quantifier-free A. By 2.522, if (a)(Eff)A(a } j3) contains 
no free Greek letters, y m ay be taken to be constructive. 

The upshot of the analysis is this. On the assumption that an a 
is given outright by a spready that is, 2.521, the notion: a is a proof of 
Ay is fully determined for all A , if it is known for (a)(Ex)A(a } x ). 22 
This is in effect the subject of 

2.6. Generalized Inductive Definitions; Bar Theorem. A formal 
reduction of: (a)(Ex)[ or: ( a T )(Ex)] t needs of course a machinery 

22 More precisely, to each formula A not containing free variables for free choice 
sequences, there is a formula A ', equivalent to A on the basis of our axioms and 
built up from the predicates Co{b) of 2.511 without the use of any variables for 
free choice sequences. 
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which does not explicitly involve free choice sequences, and so, at 
first, may seem to have nothing to do with them. This requires 
additions 23 either of (axioms from which) new constants and their 
laws (can be derived) or new (basic or transfinite) types. This is of 
independent interest. Since in 2.2 the notion of proof is primitive, 
the old-fashioned idea of inductive definitions suggests itself. 

2.61. Background . Let A(P, x) with the predicate symbol P 
(monadic; or else n-ary and x a variable over n-tuples of constructive 
objects) be well formed, that is, its meaning defined once P is 
interpreted . Inductive definitions of (the interpreted species) Pa 
take the form 

2.611. Pa(x) if and only if P(x) can be “proved” from: 

{x)[A(P, x) D P(x)l 

Inverted commas are used, because the notion of proof (2.2) 
applies to interpreted statements, while P is a formal letter; so to 
speak: nothing should be assumed about P. If a clear sense is 
given to 2.611, then Pa is interpreted and, for any (interpreted) Q, 
we have the principle: 

2.612. (z)[A(Q, x) D Q(x)] D (x)[P A (x) D Q(x)]. 

A quite separate question is whether 2.611 is useful, that is, that 
something interesting can be said about P^, in particular 

2.613. (x)[A(P a ,x) DPa(x)]. 

Trivially, for numerical x , 2.613 does not hold if A(P y x) is 
(Ey)[~P(y) & x = y + 1J. 

2.614. What A(P y x) to use? This depends of course on the 
precise sense in which 2.611 is understood. The situation is exactly 
parallel to the classical case. 

2.6141. Reduction to definition by induction. If _L is (the) 
empty (species), put Pq(x) <-» A(_L, x) y and, for n ^ 0, P* (#) 
(Em < n)A(P*> x). For positive A , that is, conjunctions and 
disjunctions of unnegated prime formulae, and n < w, each P^(x) 

83 The need for additions follows from the results below since the formal laws 
for (a) (Ex) to be derived, allow one to prove the consistency of the systems 
in 2.2. 



1S8 Lectures on Modem Mathematics 


is a primitive recursive predicate of x, and P A (x) = (En < w)Pi(x) 
satisfies 2.612 and 2.613. 24 

In this simple case there are formal rules such that P A (x) if 
P(S*0) is formally derivable from (x)[4(P,x)3P(x)j), For 
quantified A we do not have closure at a>, nor formal rules; that is, 
the sequence P„ is continued into the transfinite, and thus the 
analysis thrown back on a theory of ordinals . These in turn are 
introduced by an inductive definition: (i) In the restricted case of 
(primitive) recursive ordinal notations, the binary relation < 0 is 
inductively defined by 0(< 0 ]z,y): 

(x = 1 & y = 2) v U < 0 x & y - 2 X ) v (Ez)(x <oz&z <oy) 
v {Ez)(Em){y = 3.5* & x = \z}(m) & (n)[{z}(n) < 0 {z}{n + 1)]}. 

(ii) For the general case, cf. 2.8. Thus this reduction of inductive 
definitions is nothing else but the reduction of a class of such P A 
to a particular one, namely, Po , where Po(x) : 1 <o x (x is an ordinal 
notation). For example, for all purely universal positive A , P A can 
be explicitly defined from Pq . It does not seem to be known if this 
holds for all monotonic A . 

2.61411. Much of the literature on recursive ordinals is done 
nonconstructively. Closer inspection shows that this theory is 
much more elegant if done intuitionistically. For instance, for 
either a finitist or a nonconstructive recursion theoretic approach, 
it comes as a surprise that, for example, (normal ordinal) functions 
defined by induction on the highly nonrecursive set 0 turn out to be 
primitive recursive. (This is shown by more or less ad hoc applica¬ 
tions of the recursion theorem.) In contrast, harmless looking 
functions, such as addition, on some specific segment of 0, need not 
be recursive at all. This is exactly what one expects on Church’s 
thesis (2.7) if one considers the constructive meaning of the logical 
operators (constructions on abstract objects). 

2.6142. Reduction to the abstract theory of constructions 
requires, in the notation of 2.2, a notion p^(a, x)[a is a proof of 
P a(#)] for which 2.613 and 2.612 can be verified for the Q con¬ 
sidered. The obvious analogue to Dedekind's approach in set 

* 4 1 11 the classical case, a weaker condition on A is sufficient, namely mono¬ 
tonicity; that is, writing P C P' for (y)[P(y) P'(y)], (x) | [A(P, x) A P C 

P f ) Z) A(P f j ®)}. This is no longer true here; let A be positive, Pa nonrecur¬ 
sive, and let B(P f x) be ~~p{ x ) . V A(P, x). Then P B (x) ^ P% +1 (x) can 
be proved, but not <-► 
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theory is this: a proves that for all notions P with constructive 
characteristic function p [of two variables! b is a proof of P(x)] } 
CP)(y)[^(P, y ) 3 P(y)] * 3 P(x). Just as in set theory, this 
notion can be expressed in 2.21, but need not have a constructive 
characteristic function; thus if Q may contain Pa, 2.612 does not 
follow. It does follow from the reducibility hypothesis of 2.2; this 
also implies essentially the comprehension axiom if it is assumed 
that the notion v of being a natural number has a constructive 
characteristic function. It does not seem to be known whether 
obvious weaker conditions are sufficient to derive 2.612, and 2.613 
(for proper A); perhaps one only needs that the binary relation: a 
is a proof that x is a natural number, have a constructive char¬ 
acteristic function (and not a completed set i>). 26 

2.615. Leaving for the moment the question of a more thorough 
constructive analysis of inductively defined Pa, we shall use them 
as a means of analysis; at least for positive A. 26 For it is intuitively 
clear that any satisfactory theory of constructivity will include 
them. 

2.6151. Note that the principle of proof by transfinite induction 
on 0 is a particular case of 2.612; 2.6152: not, however, that every 
Q has a first element; 2.6153: (a)(Ex) ~ [a(x + 1) <o ot(x)] 
(trivially, that is, using only (a)(Ep)(x)[P(x) — a(x + 1)]). 

2.62. The Quantifier (a)(Ex). Consider a decidable property B 
of sequences of natural numbers and, as above, write ax for the 
sequence a(0), . . . } a(x — 1) (aO = < >). Brouwer [20] pro¬ 

posed that a proof a of ( a)(Ex)B(ax) has the form: 27 either for some 
fixed Xo , a proves (a)B(ax 0 ) or else it proves (n)(a)[a(0) = n 3 
(a)(P.t)P(a£)]. Put precisely we consider a property P(a) of 
formulae of the form (a)[a C a 3 (j&z)B(£ta)]:(a){[B(a) 3 P(&)] 

26 Cf. the corresponding problem in classical analysis, for arithmetic A ( P , x) :Pa 
is defined explicitly; to prove 2.612 for arbitrary Q one needs full comprehension; 
but a model for both 2.612 and 2.613 needs only the one function quantifier form: 
for these axioms are satisfied by functions of hyperdegree <0'; this follows 
from [30]. 

28 And Pa in turn only positively, that is, as restrictions of quantifiers ( x : Pa (s)), 
(Ex:Pa(x))‘ As a figure of speech: an inductive definition tells you what is in 
Paj not what is not in Pa- If there is something behind this idea, it probably 
has to be analysed by a different logic since, on 2.2, the meaning of ^Pi(x) is 
determined if P a{x) is determined. 

27 Or: “can” be put into this form; as any logical proof “can” be brought into 
the form described by certain formal rules. 
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& [(rc)P(a * n) * 3 P(a)]}, that is, (a){B(a) v (n)P(a * n )] D P(a)} 
which is an inductive definition of P as in 2.611, with B(x) v 
(n)P(x * n) for A (P, x) ; if B is decidable, this is purely universal. 
Rephrased in terms of functionals (neighborhood functions); either 
z(a) is constant or, for a sequence x n , x(a) = x a (o)(<x*) where 
(n)[a*(n) = a(n + 1)]. Define a species K of functions of finite 
sequences 

2.621. e E K: e«>) = 0 & {(a)[e(a) - c«0» ^ 0] v 
(n)(Eb)(a)[e(n * a) = b(a)&K(b)]}. 

Thus we express the proposed condition on proofs by: 

2.6211. («) (Ex) A (a, x)++(Ee (E K) (a: e(a) ?*0) (aCa) A [a, e(<x)-l]. 
Note that the quantifier a is pushed past (Ex), and appears only 
with a decidable restriction: C ct. 

2.622. Second Elimination Result. In H\ l extended by all the 
additional axioms of 2.5, 2.612 and 2.613 for the particular K of 
2-621, for every closed formula A, there is a formula A' not containing 
any Greek variables at all , such that: A A f . 

2.623. Some theorems of this theory (2.622). First, the converse 
to 2.6153; this follows from Brouwer’s bar theorem: if a decidable 
(binary) relation R , not necessarily an ordering , is well founded in 
the sense (a)(Ex) ~ [a(x + l)Pa(x)] then proof by transfinite 
induction on R applies to arbitrary Q, that is, (z)[( 2 / R x)Q(y) O 
Q( x )] Z) Q(x). This in turn follows from 

2.6231. If R(a, x) is monotone (not necessarily decidable), that is, 
(a)0r)[P(a, x) D R(ol } x + 1)], and Q is an arbitrary property of 
finite sequences then 

{(a)(Ex)R( a , x) & (a)(x)[R(a, x) ^ Q(ax)] 

& (a)[(n)Q(a * ») O «(a)]| D (a )Q(a). 

And this is proved by induction on K. Conversely 

2.6232. If Brouwer’s bar theorem and the two axioms for K are 
assumed we get 2.6211. 

2.6233. If K is (explicitly) defined in Hi 1 by: e(<>) = 0& 
(a)(Ex)(y)[e(ax * y) ~ e(otx) ^ 0], 2.621 and the corresponding 
principle of induction are theorems of ; Hi 1 together with the bar 
theorem.—N.B. The familiar conversion of inductive into explicit 
definitions (footnote 25) is applicable in the intuitionistic case for 
a more restricted class of A(P, x) in 2.613 than in the classical case 
(cf. footnote 24). 

2.6234. Not only do the e in K define continuous functionals, 
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but uniform continuity can be proved on functions dominated by 
a (fan theorem), x being the modulus of continuity: 

(a)(e E K)(Ex)(b)(c)({(z < x)\b(z) = c(z)] & 

(z)[b(z) < a(z) & c(z) < a(z)]} D [e(bx) = e(cx) ^ 0]). 

2.6235. By essential use of the continuity axioms 2.5, the bar 
theorem of type 1 (for relations R between type 1 objects) can be 
proved in 2.622, when formulated as follows: 

(a)(Ex) ~ [\?/a(< x + 1, y >)R\ya(< x, y >)] 

3 {(*WR*)Q(P) 3 Q(«)] 3 («)Q(«)}. 

2.624. Refinements and Alternatives . If one is interested in 
consistency: there is a finitist argument showing 

2.6241. If A is theorem of 2.622, A' can be proved in H \ extended 
by the axiom of choice and the two axioms for K . Consistency 
may be in question because 2.622 cannot be extended to a noncon¬ 
structive theory: it is inconsistent if the law of the excluded middle 
is added, while Hi with K is not. 

The latter system may still look stronger than it is because K 
is a set of functions . However, 

2. 6242. H etc. may be reduced to H itself with the cor¬ 
responding inductive definitions for recursive neighborhood func¬ 
tions. The application of the axiom of choice needed is in the form: 

(n)(Ee)[(e G K) & A(n , e)] D (Ee i) {d G K & (n)A[n, Xaei(n * a)]}; 

this is supplied by the recursion theorem. Also 

2.6243. Hi etc. may be reduced to H l with the bar theorem. 
This latter system is of the same strength as that of footnote 25. 

If one drops constructive functions altogether, and considers 
Hi + axiom of choice + bar theorem + continuity, as in [8], there 
is an alternative route to H l + bar theorem itself. First apply the 
interpretation 2.321 to get a reduction to a quantifier free system 
(BR) (so-called bar recursion of lowest type 4.222 withe ranging over 
finite sequences of numbers), then, as in 2.512, define in H l neighbor¬ 
hood functions with modified C r B , and prove the existential axioms 
for (BR) by means of the bar theorem. 

2.6244. It may be mentioned that in Section YI of the privately 
circulated Stanford report on “Foundations of Analysis’* there is an 
elegant application of 2.321 to the system Hi with the axioms of 
choice and for K } in which two basic types (cf. 2.51) occur in the 
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resulting quantifier free system T(K). This has the advantage 
over 2.6242 of giving simple closure conditions to be satisfied by 
models of the system considered, instead of using all recursive 
functions. 

2.625. It would perhaps be interesting to develop a parallel 
theory in which definitional equality (2.35) between constructive 
functions is used. Then 2.621 is modified by use of combinators 
N (for numerical) and S (for shift): (Nn)(<>) =0, ( Nn)(a ) 
= n + 1; {<S(n, e)} (< >) = 0, {<S(n, e)}(o) = e(n * a) and ( n)(Nn 
£ Kd) & (e) { (n)[<S(«, e) £ Kd\. Z) e £ Kd\, and to extend the 
analysis of [67], which provides a decision function for definitional 
equality (cf. 2.35); 

Conjecture. The decision function for terms built up from varia¬ 
bles of bounded type is itself definable [in the system T(K D ) of 
2.6244]. 

In this case, free choice sequences a T of constructive functions of 
higher type can be treated as ordinary free choice sequences chosen 
from a formally decidable spread (namely, the spread of numbers 
of normal terms of [67]). 

2.63. Discussion. The results of 2.4—2.6 give a pretty full picture 
of the interreducibility between the theories described. This 
interreducibility is complete subject to the conjecture of 2.625; 
for then the traditional inductive definitions of constructive objects 
of higher type [purely positive A(P, x r ) in 2.613] are first eliminated 
in favor of free choice sequences a T (cf. of 2.6233), and the latter 
reduced by 2.625. Inductive definitions of species P of free choice 
sequences are reduced by the elimination results to inductive 
definitions of species of constructive functions: a £ P c «-» [£(a) & 
(a £ a)P(a)]. 

The matter is open for monotone A(P, x T ). 

Note in passing that the objects known as absolutely free (better: 
lawless) choice sequences, mentioned on page 110 are not treated here. 
They are characterized by the condition that, from a certain point 
onward, no restrictions are to be made; that is, they are given by 
some initial segment, instead of arbitrary spreads. Therefore 
2.521 is replaced by: A (a) D (Da) [a C a & (0 £ a)A(/3)], as in [47]. 
Though neither kind of choice sequence is a special case of the other, 
they are interreducible via generalized inductive definitions. The 
lawless sequences have only been used as examples in completeness 
proofs (cf. 2.74) and for pedagogic purposes as on page 110. 

Concerning the mathematical content of the systems above: 
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though 2.622 shows weakness compared with full analysis, a good 
deal of impredicative analysis (cf. Section 3) can be developed in 
H 1 together with bar induction; cf. [41] to which should be added 
theorems of the Cantor-Bendixson type. 

For a figure of speech expressing continuity, the analysis 2.6212 
of the quantifier (a) (Ex) is completely satisfactory (while (a) (Ex) 
does not yield the continuity properties). HeuristicaUy this could 
be expected: In nonconstructive analysis, we have of course the 
ordinary definition of continuity <3 for mappings: N N > N (N 
with the product topology); then it is a theorem that C is the least 
class of mappings constants €<3 and if <t> o, <#>i> • • • 80 does 

Aa<£«( 0 )(«*)- Now, it so happens that if one starts with this as 
definition , most of the natural proofs simply do not use noncon¬ 
structive principles. As an assertion about intuitive proofs, 2.6241 
shows that Brouwer’s proposal is consistent with the remaining 
(accepted) axioms; here it may be added that for the usual formal 
systems the proposal is actually verified (cf. 3.341), where neces¬ 
sarily infin ite proof trees are used (as stressed by Brouwer [20]). 28 

Note that nothing has been established here on free choice sequences 
of proofs , that is, on extensions of the theory 2.21 corresponding 
to 2.5. 

2.7. Church's Thesis . Though the best proof theoretic results 
(for example, 2.6244) depend on the discovery of special definitional 
schemata, a general way of looking at the results 2.4—2.6 is this, 
for the axioms considered we have a reduction of higher types to 
arithmetic. The first nontrivial case concerns type 0° objects and is 
expressed in the thesis: 

Every constructive number theoretic function has an equivalent 
definition by means of a certain kind of computation procedure. 

The most familiar kind is: by means of recursion equations with a 
suitable equation calculus. These equations can be numbered, and, 
by [7], there is a primitive recursive relation T(n } m, p) which holds 
if p is the code number of a computation from the equation n for 
the value {n}(m) (of the function {n} defined by »). 

2.71. The Meaning of Church's Thesis . We begin with some 
distinctions. 

2.711. The fundamental distinction of [34] between finitist (com- 

*8 These are in any case much closer to the way proofs, for example in number 
theory, actu all y present themselves to us: the use of dots in proofs by induction is 
not an accident! 
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binatorial) and constructive proof, carries over to combinatorial 
and constructive definability, for example, of number theoretic 
functions. The definitions of number theoretic functions of 2.22, 
as intended, refer to abstract objects (higher types), and are only 
later reduced to arithmetic. More technically: if one understands 
the meaning of the symbols, the equations describe the functions 
uniquely, and it is then a theorem that the particular calculus used 
is complete. 

The distinction between 2.712 combinatorially definable and 2.713 
finitist function is then this: a combinatorial rule is given (that 
operates only on finite configurations), and an arbitrary constructive 
proof of the termination of the procedure is accepted in 2.712, but 
only a finitist one (cf. Section 3) in 2.713. 

2.714. Most of the literature speaks of: effectively calculable, 
instead of: constructive functions. Probably 2.712 (and not the 
more general sense of constructive definability) was meant because 
the implicit aim was to provide instructions suitable for a moron or 
a mechanism. Perhaps a better word for 2.712 would be: mechan¬ 
ically definable. Not (as far as I know): physically definable. For, 
it is true (excepting collisions as in the 3-body problem, which intro¬ 
duce discontinuities) the theory of partial differential equations 
shows that the behavior of discretely described (finite) systems of 
classical mechanics is recursive. But this may not be so in the 
quantum theory, for example, of large molecules. 29 

2.715. The support for Church’s thesis applies to 2.712, and not 
to the general notion of constructive definability. It consists above 
all in the analysis of machine-like behavior and in a number of 
closure conditions, for example, diagonalization; cf. [7]. It cer¬ 
tainly does not consist in the so-called empirical support; namely 
the equivalence of different characterizations: what excludes the 
case of a systematic error ? (Cf. the overwhelming empirical support 
from ordinary mathematics for: if an arithmetic identity is provable 
at all, it is provable in classical first order arithmetic; they all over¬ 
look the principle involved in, for example, consistency proofs.) 

29 other words: is this theory a theory in the sense of Dirac [4], that is, one 
which permits mechanical approximations? Consider (in the language of sta¬ 
tistical mechanics) the “cooperative phenomenon” (like boiling) of the mathe¬ 
matical community and its asymptotic behavior with respect to arithmetic 

problems. It looks pretty stable. Do we have a better theory of what answers 

it will give to such problems than: according to arithmetic truth? This theory 
is certainly not recursive. 
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2.72. The formulation of Church’s thesis. We express the 
thesis in (extensions of) Hi by 

0 a)(En)(m)[(Ep)T(n f m, p) & \n}(m) = a(m)]. 

2.721. If no variables for constructive functions are available, 
for example, in H 1 , we use the schema, for all A not containing free 
Greek letters: 

(, x)(Ey)A(x , y) D (En){m){(Ep)T{n, m , p) & ( x)A[x , {n}(x)]}. 

This follows in our systems from 2.72 since ( x)(Ey)A(x , y) D 
(Ea)(x)A[x, a(x)]. 

2.722. By 2.714, 2.72 asserts a little more than Church’s thesis 
since the axioms of H x are not especially evident if the a are inter¬ 
preted as constructive functions in the sense 2.712. Nevertheless 

2.723. The systems of 24-2.6 remain consistent if 2.72 , respectively 
2.721, are added. The proof for systems not containing variables 
for free choice sequences is straightforward. For other systems one 
first applies the elimination results. 

2.7231. As is to be expected from the intended meaning under¬ 
lying [8], the consistency of 2.721 for the system there discussed 
(essentially H 1 with continuity and the bar theorem) is not shown 
in [8]. However, by 2.6, 2.721 can be consistently added. Note 
that [8] establishes the consistency of the rule: if the premise of 
2.721 is proved, add the conclusion. 

Evidently 2.72 allows a reduction to an arithmetic system. More 
generally, 2.4 constitutes a reduction of certain systems for all finite 
types by means of 

2.73. Extensions of Church’s thesis to all finite types. Here 
we go back to the basic notions of higher type in 2.331 and dis¬ 
tinguish between 

2.731. Applying Church’s thesis simultaneously at all types: 
then 2.331 (i) (with extensionality) become the so-called effective 
operations of [46], and in 2.331 (ii) all neighborhood functions con¬ 
sidered are recursive. By [46] (4.21) the same classes of functionals 
are obtained. 30 

2.732. Applying Church’s thesis only at highest type, that is, at 
type £ (= 0 T say), one requires a recursive computation rule, but 
the objects of type r are only assumed to be constructive, not them¬ 
selves recursive. In case 2.331 (ii), where continuity is assumed, 

30 Note, however, that the equivalence proof in [46] is nonconstructive, cf. 
negative continuity of [20]. 
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the results are fairly complete. One considers free choice sequences 
satisfying the conditions C T for neighborhood functions of type r; 
since these are formulated in H\ one has a reduction to intuitionistic 
analysis. The most interesting conclusions concern bar recursion 
(cf. 4.3 below). The case 2.331 (i) is more delicate. The most 
interesting extension of Church’s thesis to this case is in [44], but 
unfortunately it contains no analysis of the axioms for functions of 
higher type used in the proofs of [44]: so it is not yet known if any 
nontrivial results stated there are valid on the basis of evident 
properties of constructive functions of finite type. 

2.7321. At low types the properties of the functionals of [44], 
that is, defined by S1-S9, are satisfactory; for example, for type 2, 
= 0®, if such a functional can be proved to be defined for all type 
0° objects (not assuming Church’s thesis), it is defined for all free 
choice sequences. For, we have no axioms which exclude the 
possibility that there is a rule for each free choice sequence. But 
if one proves that a type 0 2 object is defined for all type 2 objects, 
one cannot make use of the continuity properties of type 2 objects. 
This is satisfactory because we have no axioms which ensure that all 
constructive functions of type 2 are continuous. 

2.7322. Regarded as existential axioms, the schemata S1-S9 of 
[44] are weak (without additional axioms about functions of higher 
type) because, for example, the effective operations of 2.731 are 
closed under these schemata, by [46], 4.32. 

2.74. Formally speaking, Church’s thesis plays a somewhat 
similar role in present-day intuitionistic mathematics as the ramified 
hierarchy in set theory. Not only is it consistent with the known 
axioms, but it can also be used to show the formal character of 
interesting open questions; cf. page 109. One important result 
(mentioned in 2.732) is given in 4.32. Another is this.' 

2.741. The rules of intuitionistic predicate logic cannot be proved 
complete by any methods consistent with Church’s thesis. For, on 
Church’s thesis, there is a primitive recursive binary tree (spread) 
B, such that (a) B (Ex) A (ax), where a B ranges over 0, 1 sequences, 
but ~ (a )b (Ex) A (ax ), that is, by 2.524, ~[(a) fi ~ ~(Ex)A(ax) D 
(a) b(Ex) A(&x)]. By [48] this implies incompleteness. 

2,7411. A Distinction. In terms of intuitionistic primitive 
notions, there are two meanings of completeness of a set of formal 
rules: (i) every proof for an assertion (in the given language) is 
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(among those represented by) some formal derivation; (ii) every 
assertion which is provable at all also has a formal derivation. 
Only (ii) can be expected; for summary, cf. [48], and the above 
incompleteness applies of course to (ii). It may be remarked that 
even in the propositional case no completeness results for non¬ 
negative formulae are known without the use of absolutely lawless 
sequences of [47]. It would be interesting to study this matter. 

2.75. Is Church's Thesis Valid? More precisely, is 2.72 valid, 
where a ranges over all constructive number theoretic functions, 
and not only combinatorially definable ones in the sense of 2.712, 
since the latter are not basic for intuitionistic mathematics, as 
expounded here. Evidently there is no reason why the question 
should not be decidable by means of evident axioms about con¬ 
structive functions. After all, work was needed to establish 2.723! 
It might have turned out to be false. The discovery of axioms 
about constructions which are inconsistent with Church s thesis is 
certainly one of the really important open problems. 

Among the subjects discussed above, the most likely candidate 
is 2.613 for arbitrary monotone A. In view of the conjecture in 
2.625, the axioms for higher type so far formulated are probably 
consistent with Church’s thesis. 

2.8. Constructive Analogue to the Cumulative Theory of Types . 
The following set of axioms is offered as a formulation. It may be 
redundant, that is, some of the functors introduced may be obtaina¬ 
ble from the other axioms. More important, some obvious prin¬ 
ciple may have been overlooked; all that is done is to indicate a kind 
of theory. It is the natural analogue of the theory of types in 1.1. 

Variables a, 5, . . . over constructions; constants 0, 1; functors 
T (the type of a construction), D, D h D 2 (for forming sequences), 
E (decision function for =), application, and the relation symbols 
= < (partial order for constructions which are ordinals). The 

monadic predicate symbol 0 is defined by: 0(a) «-* (0 < a v a = 0). 

Intuitionistic Predicate Logic . Term formation as in 2.2. In¬ 
stead of extensionality: E(a t b) — 0 v E(a, b) — 1, E(a, b) 5=8 0 ++ 
a s=s b. Inductive definition of < : instead of the least element prin¬ 
ciple, the (classically equivalent) contrapositive, and restricted 
trichotomy (b < a & c < a) D (b = c v b < cvc < 6); however, 
these are not axioms but consequences of the following generalized 
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inductive definition: 

® ^ ( a < & & & < c) Z) a < c, a < D(a, 1) (successor) 

(0 < b < c < a)[d(b) < d(c)] D (b < a)[d(b) < D(d, a)] (supremum). 

Corresponding principle of proof for arbitrary binary relations 

Q(a, b ). 

Type structure: 

0(Ta); TO = 0, T1 = 0, Ta < T[D(a, b)] = T[D(b, a)] 

(these may be insufficient 0(a) D Ta = o?). 

Comprehension: For each term t[x], 

(Eb)(c)(Tc <TaD b(c) = *[c]). 

Type limitation: a(c) ^ 0 D Tc < Ta 
Axiom of choice: for each formula A (x, y ) 

(6)[T6 < Ta D (Ec)A(b, c )] D ( Ec)(b)(Tb < Ta D A[b, c(6)]) 
Existence of an infinite ordinal: 

0 < o>, (a)(a < co Z) <a, 1> < a>). 

2 '8H- About the axiom of choice above: construe all c-cum-type-o 
as functions of both b and the proof of Tb < a. In the recursive 
case (for a £ 0 of 2.6141) the primitive recursive functions obtained 
by the recursion theorem correspond to functors in the abstract 
theory which are not subject to type restriction. 

2.812. In Brouwer’s theory of ordinals the supremum axiom is 
assumed only for a — a*. 

2^813. I have not checked whether the principles above allow one 
to derive ordinary induction for a suitably defined w; if not, one 
must regard the existential axioms for co as an inductive definition. 
From this one can then develop enough elementary arithmetic to 
state Church’s thesis. 

2.82. Is Church's Thesis Consistent with This Theory ? Two 
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comments. On page 121: in view of the elimination of free choice 
sequences, the theory may be as adequate for present-day intuition- 
istic mathematics as the cumulative type theory for classical 
mathematics, but this is not yet known. Also, to make it really 
constructively evident, a more elaborate analysis of transfinite 
types, as in [67], may be necessary. Second, in the formulation 
above (use of E ) all terms must be assumed to be fully defined [with 
the convention: a(c) =0 if die) is not defined (for a, c, as given), 
to satisfy type limitation; in contrast to 2.15]. Thus, for instance, 
there is no direct contact with so-called higher number classes [58] 
where partial recursive increasing sequences are assumed to have a 
supremum. Most of the literature on them is nonconstructive; 
the discovery of laws for such partial operations (correct on the 
intuitionistic conception) may lead to quite a new view of the sub- 
ject; cf. footnote 16. (They may be those of current intuitionistic 
logic itself where = is decidable only at type 0.) 

3. PROOF THEORY 

The reader has probably a clear idea of what a formal system 
is (axioms and inference rules); for a systematic treatment cf. [13]. 
But he has probably never used one: for instance, to convince 
himself that (a statement) A is not a formal consequence of B, he 
generally finds a model (= example) of B in which A is false, and 
rightly assumes that if the formal rules are sensible, A cannot be 
formally derived from B) cf. page 2.121. In contrast, in the present 
section rules will play a central role; cf. 3.24. 

Notation: Throughout this section we use symbols — r, a, y, A, V 
for the “logical constants” of (possibly uninterpreted) formal sys¬ 
tems, the notation of Section 1 for truth functions, and of Section 2 
for intuitionistically interpreted formal systems. 

As explained in the introduction the basic formalist contention is 
that the inherent structure of intuitive proofs can be represented in 
terms of formal manipulations of symbols. As far as the set of 
provable statements of elementary logic is concerned, suitable rules 
were given by Frege in the last century, the possibility of obtaining 
all intuitively valid logic rules was first verified empirically, and 
much later given a theoretical justification in the completeness 
theorem (1.835). The mere generation of the theorems of a branch 
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of mathematics by means of formal rules may be called crude 
formalization. 81 

The mere existence of such rules does not explain (or justify) 
them; this can be seen as follows. Suppose we have a formal deriva¬ 
tion of the sequence of seven symbols: a + b = b + a. Why are we 
to expect that addition of numerals (by correct application of the 
usual computation rules) is independent of order? In other words: 
how is the formal derivation connected with the application we make 
of it, even, as here, within mathematics? 82 

It is not immediately apparent how this question is to be sig¬ 
nificantly formulated in a formalist frame work. As mentioned on 
page 97 there is the possibility of a conflict. Old-fashioned for¬ 
malism (for a revival and forceful exposition, cf. Bourbaki) rejects 
the question—at least as being capable of primarily theoretical 
treatment—and proposes to treat it on an empirical (statistical) 
basis. 

Trivially, as long as all formal rules are considered “theoretically 
equal,” one cannot formulate questions of their evidence™ Hilbert 
formulated a coherent proposal for treating such questions which is 
described below. It can be judged, of course, on its merits; how¬ 
ever, if one has been weaned on old-fashioned formalism, it is helpful 
to consider first the weakness of the latter position. First, in 
practice, the reliability of mathematical principles is not studied 

81 The conviction, probably, is not merely that such rules happen to generate 
the provable statements in a particular domain of mathematics, but that 
(despite appearances, footnote 28) this is really all that goes on. Just as in the 
early applications of the molecular theory in chemistry one did not merely 
mean that the particular integral ratios in chemical reactions happen to be 
formally explained by an atomic hypothesis, but that there were such things as 
atoms. As observed above, the formalist contention does not get much support 
from naive observation; officially , it was proposed on general philosophical 
grounds for its freedom from metaphysical (ontological) assumptions of the 
existence of abstract objects such as sets (Section 1), mental acts (Section 2). 
But, probably, its attraction derives at least partly from this: long before elec¬ 
tronic computers were thought of, one could see more or less how behavior 
according to such formal rules could be realized by a mechanism , that is an old 
fashioned mechanism in the sense of a Turing-machine (cf. footnote 30). Thus 
it gave the best hope of a mechanistic theory of reasoning. 

** Note for reference: the question arises even for small numbers, say a, b <20 
if we have not tried out all 190 cases. 

88 As one says: the theory lacks the multiplicity of the phenomena to* be 
explained, cf. footnote 3. 
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statistically; if it were, it would be essential, before accepting the 
principle of the least upper bound, for example, to find out whether 
the conclusions one has actually drawn from it do not use very 
special instances only (cf. page 118). Second, the matter is proble¬ 
matic even if only short expressions are considered, cf. footnote 32. 
Third, more important, no attempt has been made even to state the 
principles to be used in such a statistical study. One suspects that 
this would lead back to just the kind of question on evidence 
which is being rejected. Incidentally, a serious analysis of the 
matter should be very interesting. 

3.1. Hilbert's Program . The remark about the commutative law 
(or, for that matter, about statistical inference) shows that the 
crude formalization already described has to be supplemented by an 
analysis of principles of evidence; for positive results these will be 
codified in particular formal systems whose formal derivations (are 
recognized as representing proofs which) have this kind of evidence 
(cf. page 122). Then the problem is: Suppose we are given a codifica¬ 
tion {in the crude sense above) of a branch of current mathematics; to 
reduce it to proofs of the particular kind considered. 

3.11. What form should the reduction take? Hilbert’s basic 
considerations were (more or less) these: 

3.111. The type of reasoning which constitutes the most ele¬ 
mentary analysis of such things as the commutative law, is of the 
kind expounded systematically by Grassmann, and codified, for 
example, in primitive recursive arithmetic; it refers to purely concrete 
situations, in our particular case to the very rules for computing 
sums. This is a type of reasoning used in all theoretical work, i.e. 
constitutes a minimal presupposition. So if the reduction, in what¬ 
ever form it is formulated, only requires this kind of reasoning, it 
has the maximal kin d of evidence that can be reasonably demanded. 

3.112. The property of being a formal derivation in a formal 
system, or the relation between premise and conclusion in an infer¬ 
ence rule, is of the same arithmetic kind (use any of the familiar 
codings of finite strings of symbols by means of numbers). 34 In 
consequence we have an arithmetic relation Provi? (n, rA - *): n is 
a formal derivation by the rules F of the formula A (and r A _l is a 
term whose value is the number of A). 

84 Or, conversely, represent numbers by strings of symbols! (Concatenation 
theory.) 
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3.113. Basic Point. It is clear that, for the intended interpreta¬ 
tion of the formal languages in Sections 1 and 2, most formulae do 
not have purely combinatorial significance at all, because they refer 
to abstract situations, for example, if they have quantifiers over 
sets, or (implied quantifiers in) implications in Section 2. These 
formulae Hilbert called ideal elements of the language used to round 
it out. But certain relations are intended to correspond to com¬ 
binatorial ones; precisely, for an elementary relation P(n ), we may 
have a set theoretic or intuitionistic relation represented formally 
by P(x), the term S n 0 denoting the nth numeral, and we have 
(elementary) proofs of: 

3.1131. P(n) => Pro v F [w P (n), rp(S n 0)i] 

3.1132. P(nj => Provi, Wf(n), r- r P(S n 0)"i] 

where —r is the (formal) negation symbol of F, 7 rp, irp (elementary) 
functions which essentially describe how the intended computation 
of P(n) can be mimicked in the abstract theory. Formalization 
before Hilbert had actually provided such proofs (= nothing else 
but the development of arithmetic in set theory, and so on). Now 
reducing proofs of assertions of this combinatorially significant kind to 
combinatorial ones is expressed (for the formal variable x of F) by: 

3.1133. ProvF [m, rp^r) 1 ] => P(n) 

This statement now is formulated entirely by use of purely combinatorial 
terms , and therefore there is at least a possibility of proving 3.1133 
on the minimal assumptions mentioned in 3.111. 

Hilbert also emphasized the amusing fact that, granted 3.1131 
and 3.1132, 3.1133 is an immediate corollary of the consistency 
statement (with variable formula A), variables m and n over (num¬ 
bers of) formal proofs: 

Pro y F (rn } r A n ) => Prov F (p, r -rA“ 1 ). 

3.12. Some Pragmatic Comments, (a) It is clear that there might 
be ambiguity in what exactly constitutes an elementary (con¬ 
sistency) proof. If one really expects to find one, one does not 
split hairs beforehand, but gives a precise formulation only after 
one has found such a proof. The same applies, for example, if one 
asks for a practical or nice solution of any other problem. Only 
when it comes to negative results is an analysis needed, (b) The 
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same applies to possible ambiguities in the arithmetic representation 
of the combinatorial relation: Proven, fA” 1 ). 35 (c) Of course, a 
proof of consistency is a minimum requirement for the kind of 
reduction required; but the central role of consistency depends on 
the particular kind of evidence here studied where only purely com¬ 
binatorial assertions are “real.” In a proof theory which uses less 
elementary methods (and therefore is useful for a reduction to a less 
elementary kind of evidence), it would be quite unreasonable to 
give a special place to the consistency problem, (d) In the light 
of past research the consistency problem is recognized as technically 
difficult. If one really holds the mechanistic conception, one will 
of course regard it as important, but not necessarily difficult. Just 
as, when one believes the equations of classical mechanics to be 
true, one will regard as important the verification that initial posi¬ 
tions and velocities determine future behavior (subject to given 
forces), but not necessarily difficult. 

3.2. Basic Facts about Formal Systems (limitations and general 
considerations on the choice of systems). Throughout this section 
we use the following 

3.21. Distinction. Instead of considering formal systems, that is, 
a collection of rules, one sometimes considers equivalence classes, 
where systems with the same set of theorems are identified. This 
is often appropriate for questions which refer just to the set of 
theorems, for example, whether it is recursive [16] or saturated, 
that is, for each (closed) formula A, either A or -tA is a theorem. 
In this case, the natural requirements 3.1131 and 3.1132 for repre- 
sentability of a property P(n) may be weakened: P(n) =* [-^P(S n 0) 
and P(n) =» \- F — rP(S n O) are only required to be true for each n 
(in the classical case; and constructively, not necessarily ele¬ 
mentarily, provable in the intuitionistic case). Much of this work 
uses little more than the recursive enumerability of the set of 
theorems of the formal systems considered, and is therefore best 
treated as part of recursion theory, cf. [13]. (However, there are 
amusing differences of detail [25].) For finitistic refinements, cf. 
3.3042. 

3.22. Rules have to be considered when one is interested in 

38 This ambiguity is liable to be overlooked if one assumed that, for two com- 
binatorially defined P(n), P'(n), either one can find no, P(no) <£=> P'(no), or else 
there is an elementary proof of: P(») « P'(»). 
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questions of evidence since the equivalence of different sets of rules 
may simply not be evident. Obviously, if Prov^ (n, m) is one 
representation of the proof relation for one of the usual systems, and 
Prop (m, m') expresses that m is propositionally equivalent to the 
negation of m' then 

3.221. 

ProVi? (n, m) a A p < n A m'[Prop (m, ra') —> —r Prov^ (p, m')] 

is another representation; for 3.221, consistency is trivial since 
derivations leading to inconsistencies are forbidden. (N.B.: if A 
and A —► B are provable in sense 3.221, not necessarily B!) 

3.222. Elimination of ambiguities (cf. footnote 35). If a precise 
analysis of formal systems is given [13], ultimately: a finite set of 
rules (for generating possibly infinitely many axioms), the canonical 
representation (c.r.) in F of the rules is determined up to formal 
equivalence in F by this: the rules have the form of an inductive 
definition (2.612, 2.613 with quantifier free positive A) of the 
property: A is formally derivable; then the formula P(^) of F 
represents this property canonically if 2.612, 2.613 are derivable in 
F for P in place of P. Then with a minimum of arithmetic in P, 
for two c.r. P, P' we have \-f A x]P(x) *-> P'Or)]. 

3.2221. To apply to quantifier free systems, minor modifications 
must be made. 

3.2222. It should be remarked that though a theory of c.r. is 
clearly possible it has never been worked out in full in the literature: 
an elegant exposition would do much to make the elements of proof 
theory more systematic. 

3.2223. By 3.12, these general considerations are not specially 
needed for positive results, to be given in 3.3. But they help to 
formulate the 

3.23. Fundamental Incompleteness Result [32]. Suppose F is a 
formal system, consistent, and a certain minimum of arithmetic can 
be developed in F (for precise conditions, cf. [13]; in particular, 
the proof relation itself is represented in the sense of 3.21), then 

3.231. There is an elementary, even primitive recursive property, 
P represented by P(x) such that neither \-fAxP(x) nor r AzP(x); 

in fact, for each n , P(n) is true (because if [-*»—rP(S”0) for some n, 
\-f—tAxP(x) in the F considered). 

3.2311. While there is an axiomatic characterization of the notion 
of natural number (relative to the notion of set) in the sense of 
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1.32, there is no deductive characterization (by means of a formal 
system) in the sense of yielding precisely the true statements in the 
usual notation. (Axiomatic characterizations in 1.32 are so-called 
second order axiomatic systems.) 

3.232. Under essentially the same conditions as in 3.231, there 
is a formula A such that, if Prov (n, m) represents the proof relation, 
not \~p A x[Prov(x, TA“l) -4 A]. 

3.2321. For the usual proofs, A is AxP(x) of 3.231 because it 
expresses its own nonderivability: A «-> Ax -r Prov (x, r A"l). 

3.233. Under essential use of provability of closure under detach¬ 
ment (cf. 3.221) and stronger representability conditions [52], but 
certainly for c.r. Prov, \-f Ax[Prov(x, r A"*) —> A] if and only if 
\~fA. 

3.2331. Cor. Taking 0 = 1 for A, the canonical expression for 
the consistency of F cannot be proved in F . 

3.231 is the natural formulation of inadequacy of formal systems 
for the notion of arithmetic truth; but it is not sufficient for inade¬ 
quacy with respect to intuitive provability because one has no 
reason to suppose that, for each combinatorial P, P(n) with free 
variable n is intuitively provable if it is true for each n . 

3.234. Discussion of 3.232 for the Mechanistic Conception. We 
cannot construct a formal system with the following two properties: 
(a) all elementary proofs are represented in it; (b) its formal con¬ 
sistency is provable by elementary means. But both would be 
needed to support the mechanistic conception in its full sense: (a) 
is needed to say that a formal system is adequate for the representa¬ 
tion of all elementary proofs; and (b) for the reduction to elementary 
evidence. But 3.232 does not exclude a practical validity for the 
mechanistic conception with a kind of transcendental singularity: 
(a) is false, but for all axioms for which we believe to have an 
abstract interpretation, (b) holds. 

The general feeling has long been that the opposite is true; namely 
that classical arithmetic, and certainly classical analysis, satisfies 
(a). This requires a precise characterization of elementary proofs. 
3.4 and 3.5 contain some relevant work and 3.6 open problems. 

3.24. Technical Considerations about Formal Systems (rules versus 
axioms). The reader is probably most familiar with certain kinds 
of formal systems, namely so-called axiomatic theories in standard 
formalization [16] consisting of axioms and valid rules of inference, 
that is, schemata for which we recognize that, if the premises are 
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true so are the conclusions. This type of formalization is suitable 
when one wants to make as evident as possible that all theorems of 
the system are valid (for the intended interpretation). A totally 
different consideration is needed if one wants to characterize some 
such notion as elementary or combinatorial proof; specifically, if 
one reviews 

3.241. Formalization in the light of the incompleteness theorem 
3.2331. Formal systems (having a recursively enumerable set of 
theorems) are not suitable for characterizing the notion of arith¬ 
metic truth. If they are to be used for characterizing some (infor¬ 
mal) notion of proof , it is essential that each formal theorem represent an 
( informally ) provable result , but that this fact be not recognizable by 
means of such an informal proof . What this requires is just the 
opposite to the more familiar aim of formalization: we want rules 
whose correctness, for the notion considered, depends essentially on 
the fact that the premises have been derived in a particular way . 
To play safe (for avoiding conflict with 3.2331) one may have to use 
symbols which are not interpreted for the informal notion consid¬ 
ered; for example, in 3.4 we shall use a symbol that is a kind of 
mnemonic. 

3.241 L Naturally, the use of rules rather than axioms recom¬ 
mends itself also from different points of view, for example, [31], 
And, purely pragmatically, it is clear that by using simple axioms 
and many rules of a simple structure (instead of few complicated 
axioms) one may make it evident that many interesting properties 
of the premises are inherited in the conclusions, and so, by induction, 
one shows that all formal theorems have these properties. This is 
often more useful for independence proofs than the construction of 
models (cf. 3.3232). 

3.242. What rules should one study? Practically all results 
below are stated for intuitionistic systems. First, it turns out 
(3.31), the proof theoretic relations for the corresponding classical 
systems are read off simply by considering negative formulae (2.33) 
(while the converse is not true; thus the monadic intuitionistic 
predicate calculus is undecidable while the classical one is decidable 
[49]). So, the true generality of the results is hidden if one formu¬ 
lates them for the classical case only. It would be interesting to 
have a quite abstract formulation for general formal systems which 
isolates the structural properties of the (intuitionistic) rules actually 
relevant to the main results. But this has not yet been done. 
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Second, in the classical case one often gets substantially simpler 
semantic arguments for the results, at least if they are formulated 
in the traditional manner. In other words, by using the semantic 
validity of the rules in the sense of 1.831. The use of (the corre¬ 
sponding) 2.311 has not yet led to similar simplifications, and so, in 
the intuitionistic case, proof theoretic methods are not artificial. 

3.243. A matter of taste: even when our primary interest is in 
getting particularly elementary proofs of a result, we try to refprmu- 
late the result so as to force the proof to be elementary. This is 
illustrated by the following stages of the consistency problem (for 
precise definitions, cf. 3.332). For any of the ordinary systems of 
Section 1 or 2, the problem is trivial without restrictions on the 
method of the consistency proof because the systems are valid under 
the intended interpretation. The big step is to ask another ques¬ 
tion which implies consistency in an elementary way: can the usual 
rules be replaced by so-called cut free rules? This is not immediate, 
but almost so (in the classical case): one shows the completeness of 
the cut free rules and concludes elimination from the validity of 
the usual rules. But if we ask: on what measure of complexity of a 
proof with cuts does the length of the corresponding cut free proof 
depend? then the natural solution is by elementary means. Such 
“computational” reformulations have been useful in the past for 
locating errors in syntactic arguments, for example, [17] by [6], 
page 123, or [38] by [24]. Cf. also 3.351 below. 

3.2431. It is familiar from ordinary axiomatic mathematics 
that sharpened formulations may force proofs to be more ele¬ 
mentary: a first order statement about rationals may be true but 
we have no idea how to prove itj if the same statement is true for 
all real fields, it is provable by first order predicate logic (from the 
axioms for formally real fields). 

3.3. Technical Proof Theory . Most of the early work in proof 
theory was done in support of Hilbert's program and formulated 
accordingly. Now, first, Hilbert’s fmitist program can be carried 
out only for a limited range of mathematics; if this is to be extended 
at all, one must consider a reduction to (significant) principles of 
evidence other than finitist, and by 3.12(c) and 3.2 the consistency 
problem has to be widened. Second, extant proof theory actually 
establishes more than consistency. Naively, it might be supposed 
that each piece of proof theory has to be considered “on its merits.” 
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But it turns out that significant results are conveniently summarized 
in terms of the notion of interpretation (of a system F in F 0 ). The 
syntactic relations will be established in a (partial) codification F\ of 
interpreted proofs, in which the proof relations for F , F Q have the 
c.r. Prov(n, m), Prov 0 (n, m) respectively. For clarity, bold-face 
letters are used for expressions of (possibly) uninterpreted formal 
systems, and when the context makes the meaning clear, formulae 
are identified with their numbers and comers dropped, (a) In 
contrast to the consistency problem, one may consider a class d of 
formulae in F other than c.r. of combinatorial properties, (b) If 
then d contains quantified formulae and F 0 does not, simple exam¬ 
ples suggest that with each A of F one should associate a disjunction 
Ai v A 2 v • • * in Fo, for example for A = VxB(x), B(0) vB(SO) v 
* • • * (c) Finally, the interpretation should apply not only to 

proofs in F itself, but to the use of F for derivations from additional 
axioms (A |- B:B is derived in F from A). Other properties of the 
notion are mentioned after the definition: 

An F 1 interpretation in F 0 of a class d of formulae in F is deter¬ 
mined by function terms /*, w, <r (of F 1 ) with the following properties 
(n is a variable over formal proofs of F; A, B over formulae of d). 

3.301. For c.r. A of combinatorial statements, AGfl and also, 
for each n, p(n, A) is the corresponding c.r. in F 0 . 

3.302. Prov (n, A \~ B) => Prov 0 [*{n, m), n(m, A) \-q n{<r(n, m), B}] 
can be proved in F\\ taking for A(B) a true (false) combinatorial 
formula, we have from 3.301 and 3.302 

3.3021. f-i Prov (n, B) => Provo [ir'(n), n{<r'(n), BJ] 

3.3022. Consistency F 0 K Consistency F 

Note. Ad(a) : 3.3022 shows that we really have a widening of the 
consistency problem. Ad{b) :n(i y A) is A*. Ad(c) : 3.302 expresses 
that Bi v B 2 v • • • follows in F 0 from each A i} that is, from Ai v 
A 2 v • • • 

3.303. If F 0 C F, an F\ interpretation is called proper or dis¬ 
criminating if it satisfies in addition to 3.301 and 3.302 for A E: Gt 

l-i Prov [p(n, A), p(n, A) \- A], 

that is, A is no stronger than Ai vA 2 v * • • with respect to deduc¬ 
tion in F. This permits a precise formulation of the relative sub¬ 
tlety of two formal systems, cf. 3.3231. 
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3.304. Simplifications . Suppose all the nonelementary defini¬ 
tions of the syntactic argument have been “built into F o”; precisely, 
suppose ir, p, a are quite elementary, for example, primitive recursive 
(PR). Then, if (one can prove in PR:) PR C^iC^o and if there 
is F\ interpretation then there is also a PR interpretation, that is, 
3.302 (which, by hypothesis, is stated in PR) can be proved in PR. 

For, first generally (3.113), for A £ PR, if |-iA then: (Con¬ 
sistency F i) (-pR A ; second, from an inconsistency in F i (and hence 
in F 0 since Fi C F 0 ) we get a PR description of a formal proof in F 0 
of any formula of Fo, in particular of p(m, B). By combining the 
steps we have the result. 

3.3041. If Fi contains quantifiers, 3.302 simplifies to 

Prov (n, A |- B) 3 (i m)(Eq)(Ep ) Prov 0 [p , m(™, A) j- 0 n{q y B)] 

(where the notation of Section 2 is chosen because this is the most 
important case in applications). 

3.3042. A formal instance of our interpretations is this: both F 
and Fo contain the whole of classical logic (and therefore of no inter¬ 
est in the present context), but not only the consequence relation 
(3.302), but all logical connectives are required to be preserved by 
the mapping /*• These are the interpretations in the special sense 
of [16]. Though, by [13], the additional requirements are useless 
for the applications of [16], amusing results have been discovered 
for this notion in [26], 

But the most interesting simplification occurs if we have 

3.305. Reflexiveness of Fi over F o with respect to do> that is, for each 
A £ do (possibly containing the free variable m) there is a corre¬ 
sponding (interpreted) formula A of F\ and 

h Provo (n, rA[S w 0]i) D A[m]. 

It is understood that r AtS^H )] 1 is a term of F x (with variable m) which 
is the c.r. representation of the function giving the number of the 
formula obtained by replacing x in A[x] by the mth numeral. 

3.306. Suppose each n(n, B) is an instance of a quantifier-free 
schema B 0 (s } t) } that is, obtained by replacing s by a constant 
(functional) term of F 0 . Then 3.3021 reduces to: h Prov (n, B) D 
(Es)(t)B 0 (s } i ), which is the form of the interpretation of logical 
constants of [34]; cf. 2.321 and used throughout Section 2. It 
satisfies 3.302. 

3.31. The interpretation problem (3.3), like the consistency prob- 
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lem, refers to a, possibly proper, subclass a of formulae in F, 
but proofs in F of A(A £ (t) may contain formulae not in Q. So 
even if we have an interpretation in mind, an inductive proof of 
3.301—3.302 may be difficult. There are two methods: 

Either to extend the interpretation to all formulae of F or to 
reformulate the rules of F so that in the new system F* a proof of 
A (A e «) consists only of formulae in a itself. 

A very simple, but particularly useful, example of the former 
is this. F is classical (predicate logic or) arithmetic, and Fq intui- 
tionistic arithmetic. One takes for n(A) the negative version of A 
(2.3322) and verifies that the rules of F are derived rules for the nega¬ 
tive fragment of F 0 . 3.302 follows, because, for closed A, and the 

deduction theorem, A |- B if A —V B, and A“ D B“ is (A B)“ 
This applies if consists of negative formulae only. 

At first sight the second method looks unpromising because of 
3.302: in our paradigm, where (t consists of purely universal 
formulae, A -» B is not in a even if A and B both are. But it 
turns out to be the method of the subject. 

3.32. Cut Elimination in Predicate Logic. For a nice list of cut 
free rules see [7], The rules have two striking properties; after the 
corollaries we shall examine what weaker properties would be equally 
useful. (As stated in 3.242, intuitionistic logic is studied.) 

3.321. A cut free proof of A contains only subformulae of A, 
that is, logical components, or Ct(t) if A is A za(x), or Vxa(x) and 
t is an individual variable or constant. 

3.322. There are obvious derived rules which are not valid as 
implications, for example, 

3.3221. for negative (and somewhat more general [45]) A, if 
I" A -¥ VxB(x) then h Vx[A-> B(a?)], but not for all these A, |-[A -4 
VzB(a;)] —► V x[A -4 B(x)]; cf. 2.34 concerning the axiom of choice. 

3.3222. For quantifier free P(#), if | T 7 VyP(y) or equivalently 
1 rAy—rP(y) then there is a finite sequence of terms ti, • . . , t n 
such that | r[—rP(ti) a ... a —rP(tn)] or, equivalently, 

h-T-r[P(ti) V • * • vP(tn)]. 

Note the following connection with 3.3221. Take a finite set A x of 
axioms for arithmetic which satisfies the conditions of [45] and 
implies P(t) v -rP(t) for each term t and quantifier free P. Then 

^ ^ v 2/^(2/) ^ [Ai a — r VyP(y)] —► VyP(y) ; take the premise 
for A. Now cf. 2.322. 



Mathematical Logic 161 


3.323. First Main Result. The cut rule is: from (i) A-4 (B v C), 
(ii) (B a D) E infer (iii) (A a D) E v C. There is a primitive 
recursive e(p, q) such that, if p proves (i), q proves (ii) (without 
cut) then e(p, q) proves (iii) without cut. This fact is also proved 
in primitive recursive arithmetic. The length of the proof e(p, q) 
grows exponentially with the number of iterated implications and 
quantifiers in B. [31] 

Corollary. Every theorem of ordinary intuitionistic predicate 
logic can be proved without cut.—N.B. No assumption of com¬ 
pleteness of the rules is made . 

3.3231. Some Interpretations. F predicate logic, F 0 intuitionistic 
propositional logic, F% the fragment of arithmetic needed for 3.323, 
Ct purely universal, purely existential formulae or their double 
negations. By 3.3222 /z is an interpretation, if ix(n, A) - A if A 
does not contain existential quantifiers, if A is VxP(x), fi(n y A) is 
P^) (in some enumeration of terms), if A is — t—t V xP(x), /z(n, A) = 
—r—r[P(ti n ) v * • • v P(t* n n )]. This interpretation is discriminating 
(for a). It can be extended to all prenex formulae and their 
negations (no counterexample interpretation). 

It does not seem to be known whether the whole of (intuitionistic) 
predicate logic has a discriminating interpretation in its quantifier 
free part. 

If one regards classical logic F as the negative part of intuitionistic 
logic, one has trivially a discriminating interpretation (identity). 
But there is no discriminating interpretation of the latter in its 
negative part; this expresses in precise terms the impression that 
classical logic is essentially simpler. 

3.3232. Applications to Particular Axiomatic Theories. Suppose, 
the axioms considered are quantifier free, that is, purely universal, 
and, on the intended interpretation, the function terms denote 
combinatorial rules. This holds for fragments of arithmetic in 
which the axioms are suitable recursion equations, and induction, 
applied to quantifier-free formulae, is a rule of inference; also for 
fragments of geometry, theory of algebraically closed fields applied 
to the field of algebraic numbers, and so on. Then 3.3231 shows 
essentially that the use of quantification theory in proofs of quanti¬ 
fier free formulae can be eliminated, 36 since 3.302 allows one to add 
universal axioms. 

86 The use of rules of inference: if quantification theory is not needed in the 
proof of the premise, apply the rule to the quantifier free theory, and start again. 
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This is a true reduction: for the intended meaning of the logical 
apparatus assumes either (in the classical case) the existence of 
infinite collections or the abstract theory of constructions in 2.2. 
In the cases above prime formulae are decidable, and so the proposi¬ 
tional operations of 2.2 reduce to truth functions. 

Whatever the verdict on Hilbert’s program this result shows 
that the mechanistic conception has a far wider application than 
appears on the surface. 

The results above extend naturally to all prenex formulae and 
their negations. Thus, for example, the formal independence of 
A •£ V yA(x, y ) can be shown without giving an interpretation of the 
axioms where it is refutable, but by showing that AxA[x, t(z)] 
is not satisfied by any of the particular function terms used in the 
interpretation. 

3.3234. Reflexiveness (cf. 3.306). Here 3.321 is exploited. 
Suppose F 0 C F x , and, for any given A there is an enumeration 
(restricted truth definition of 1.821) in Fi of the subformulae of A. 
Further, relative to this enumeration the validity of the (cut free) 
rules of F 0 can be proved in Fj. Then Fi is reflexive over F 0 . (For 
careful statements in the classical case, cf. [53]; but the work carries 
over to the intuitionistic case except that not only the depth of 
quantifiers but also of iterated implications must be counted.) 

Wbat makes the observation useful is that for none of the usual 
F i is there a truth definition for all formulae, but often for sub¬ 
formulae of any given formula (subformula in the sense of 3.321) ; 37 
for example, F j is first order arithmetic, F o predicate logic restricted 
to the notation of F Following the convention of 3.3, boldface 
letters denote expressions of Fq, and when enclosed in corners terms 
of F i representing their numbers. 

N.B. While the argument for 3.323 is purely equational (primi¬ 
tive recursive arithmetic), the proof of reflexiveness applies induc¬ 
tion to Provo (m, r A[S n 0]~ I ) 2) A(n) where A (n) is the re¬ 
stricted truth definition, that is, in general to a quantified formula. 
Conversely, assume the schema 3.306 and proofs in F 0 of P(0), 
P(n) —► P(n + 1) with free variable n. Then, by elementary 
arithmetic: P rov 0 [<r(n), r P(S"0)i], and (by 3.306): P(»); that is, 
modulo this bit of arithmetic, the schema 3.306 is equivalent to full 
induction. 

” Completely parallel work applies to quantifier free systems; only here truth 
definitions for formulae reduce to valuation functions for terms. 
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This type of equivalence will be generalized in 3.3321. Note that 
for Hilbert’s original program, reflexiveness was of no interest 
except when A represents a combinatorial statement, when it 
reduces to consistency. But even for so-called a>-consistency it is 
useful, that is, with free variable A over formulae with the formal 
free variable x:W{A) 

(n ){Provo [n } r —t AxA(x)" 1 ] D ~(m)(Ep) Provo [p, r A(S m O) n ]} 

For each fixed A, this follows from 3.306, that is, can be proved in F i. 
To get w-consistency itself, one only needs reflection over F\ applied 
to the formula W( A). Intuitionistically, ^consistency is equiva¬ 
lent to (the negation of a prenex formula) 

~(EA) (. En){m)(Ep ) {Provo [n, r — r AxA(^)" 1 ] & Provo [p, r A(S m 0) 1 ]} 

and thus can be interpreted in quantifier-free form provided 3.3231 
is suitably extended; cf. 3.33. 

3.33. Cut elimination in arithmetic H (2.412) is not possible in the 
following sense: there is no recursively enumerable sequence of 
derived rules with the following properties: (a) (for the canonical 
representation of the rules) it can be proved in H that the rules 
satisfy 3.321; (b) proof figures are finite so that induction can 
be applied as in 3.3234. For then we should have in H } for the 
derived proof relation Prov#: Ptovd (n, r A" 1 ) D A, contradicting 
3.232. In fact, by 3.2321, (a) can be weakened to (a'): purely 
universal statements can be derived only from purely universal 
statements; and, of course, other variants. 

Now, without (a), the cut free reformulation would lose much of 
its purpose. The clue for a proper weakening of (b) is this: to 
use infinite proof figures , already mentioned in footnote 28. The 
infinitary rule to be used is the so-called a>-rule: from A(0), A(l), 

. . . infer AxA(x). 

What restrictions should be put on the proof figures (if the 
resulting system is to be equivalent to ordinary arithmetic) ? First, 
the obvious way of ensuring inclusion of the new system in the old 
is that one should be able to “talk” about the new system in the old. 
Precisely, each proof figure is a partial ordering of formulae with 
premises preceding conclusions: it should be definable in the old 
system, and it should be provable that (the formula at) any node 
is related to its immediate predecessors according to the rules. 
Finally, instead of (b), what is required is this: transfinite induction 
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be provable for the partial ordering, applied to 

Prov/ (m, rA(S n O)i) D A(n) 

where Provj (ra, A) means: the formula A appears at the node m; and 
A(n) is the restricted truth definition. Second, there is the com¬ 
binatorial problem of converting each proof of A in the ordinary 
system into an infinite proof in the new one. To be useful the 
infinite proof figure must have the properties just mentioned. 
[“Combinatorial” because quantifier free, that is, for a suitable 
function constant tt: if Prov# (n, A), tt( n) is (the number of) a 
description of an infinite proof tree for A.] 

3.331. Basic Lemma. Suppose an ordering < is defined in (for¬ 
mal) arithmetic and there are terms z 0 , Zi, w, s(z, y ), exp (re), which 
can be proved to satisfy the usual defining conditions for 0, 1, o>, 
ordinal addition, exponentiation to the base 2, and their inverses. 
(For precise statement, cf. [14].) Then, for exp (x) in the field of 
the ordering, transfinite induction (t.i.), in the sense of 2.623, up to 
exp (re) applied to a formula Q can be formally derived from t.i. 
applied to Q' up to re. Q' is defined from Q by means of a numerical 
quantifier. Further, there is an ordering (the first one thinks of) of 
ordinal eo which satisfies the conditions above. 

The proof in [14] is stated for classical arithmetic, but uses only 
intuitionistic rules. The particular conditions imposed on the 
terms s(x , y ), exp (x) have this consequence. Suppose for another 
ordering <', terms Zo' etc. can be proved to satisfy the same con¬ 
ditions, and the orderings restricted to predecessors of x, x' resp. 
have ordinal a(<e 0 ). Then the isomorphism between them can 
be proved formally. 38 

3.332. Second Main Result [14], Suppose there are cut free proof 
figures for A —► (B v C), (B a D) 4 E which can be proved in H to 
satisfy the conditions of 3.33. Then one gets, primitive recursively 
in the description of these proof figures , a definition in H of a cut free 
proof figure of (A a D)-4 (E v C). The latter also satisfies the 

88 To state this one needs function symbols, for example, Hi of 2.412, which is a 
conservative extension of H . The result above is also of interest without the 
requirement of provability; for example, if the functions mentioned are recur¬ 
sive, the ordering is necessarily recursive, and so is the isomorphism. This 
intrinsic characterization of natural €o orderings has been extended [23] to 
ordinals beyond € 0 for suitable additional ordinal functions. It would be 
interesting to extend it to all recursive ordinals. 
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conditions of 3.33. If the ordinals of the given proof figures <a, the 
ordinal of the new one grows exponentially with the number of quanti¬ 
fiers and iterated implications in B. 

Corollary . Any proof of H can be reduced to a cut free proof of 
ordinal <co, that is, in the precise sense of 3.331. For, each use 
of induction in H can be replaced by a single application of the 
w-rule; since there are only finitely many in any proof of H } we have 
an immediate reduction to a proof figure of ordinal <to 2 (with 
cuts). 

Actually, the result states more than has so far appeared in the 
literature or (as far as I know) been done in detail. The work of 
[14] is formulated for the classical system, and somewhat informally 
only. There are indications for a more or less convenient description 
of infinite proof figures in [57], The result stated above is what is 
needed for interesting applications (for example, reflexiveness of 
co-induction over if). 

3.3321. The consistency of H is proved by applying transfinite 
induction on € 0 in the following form (to a purely combinatorial state¬ 
ment): for a certain primitive recursive r, r(n) < nvn — z 0 , 
A[r(n)] => A (n), A ( z 0 ) are proved by combinatorial methods and then 
A (n) is inferred. (To do this one arranges that the proof figures are 
described primitive recursively; this is not troublesome.) 

3.3322. More generally, if together with co-induction is reflexive 
over if. For, first, by quantifier-free co-induction, Prov (n, A) => 
Prove f fc(n), A] ( for the cut free system), and then Proven (w, A) 
Z) A, As in 3.3234, the reflexion schema is equivalent to 
co-induction applied to arithmetic formulae. 

Why can the use of transfinite induction be even contemplated 
as a reduction of ordinary induction ? Ordinary induction is applied 
to quantified formulae which have, prima facie, no combinatorial 
meaning at all. The application of transfinite induction is at least 
stated in combinatorial terms. 

3.34. Refinements and Extensions . The most significant exten¬ 
sion to date, namely application to ramified analysis, will be treated 
below. 

3.341. Addition of Function Variables. It seems plausible that 
cut elimination can be extended to Hi or H 1 (2.51) with the full 
axiom of choice. The case of free function variables is implicit 
in [14], although the results are stated for function constants. For, 
all that is assumed in [14] are axioms of the form a(S n 0) = S”^) v 
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a(S n O) 7 * S m 0. The appropriate additional rule is: infer A(a) 
from a(S n 0) = 0—► A(a), a(S n 0) = SO-* A(a), and so on. 

In a cut free proof of VxA(a , x), the only way of introducing the 
existential symbol is then: [a(0) = S tco Oa • • • a a (S m 0) = 
S nm 0] VxA(a, x) from A(«, £ p 0) for some fixed 

P) n o • • • > A consequence, already referred to on page 143: 
if A (a, x) is monotone and VxA (a, x) is proved in H l (with A in 
place of R in 2.6231) then bar induction is also provable in H 1 
(2.6231). 

3.3411. The inclusion of free function variables permits the 
extension of 3.3231 (no counterexample interpretation) by applying 
transfinite induction to formulae containing free function variables. 
Let A be the negation of a prenex formula: 

-rAziVs/i . . . Ax n Vy n R(x h . . . , x«, y u . . . y n ). 

Then 

A-t-rAZi . . . x n R[xx . . . x n , a^Xx), . . . , a n (xx . . . x n )] 

which we write -rAxS(a, x), where S is quantifier-free. But the 
derived rule 3.3222 carries over to arithmetic: 89 so Vx -rS(a, x) is. 
also provable. Apply 3.341. 

3.342. The Significance of €<). As shown in [14], if instead of 
starting with the schema of ordinary (<*>) induction, one starts with 
induction on an arbitrary ordering a, one gets to the next e-number 
past a (for orderings satisfying the conditions cited in 3.331). 

3.3421. For negative results much weaker conditions are needed. 
Evidently, by 3.3321, for orderings of ordinal eo satisfying the 
conditions of 3.331, transfinite induction cannot be proved in H 
(even for certain primitive recursive predicates). But much more 
is true, by use of 3.341. 

Suppose x < y is an arithmetic ordering (or even one defined, in 
prenex form, with existential function quantifiers only), and trans¬ 
finite induction can be proved for it for all formulae containing free 
function variables. Then its ordinal is <e 0 . 

Consider the statement which expresses that a is an order pre- 


So does 3.3221: this wholesale carry over of derived rules from predicate logic 
is one of the special virtues of cut free infinite proofs. An important exception 
is this: if A v B is provable in predicate logic either A or B is provable. In 
arithmetic, only if A and B are closed. 
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serving mapping of the natural *o ordering into a segment of x < y. 
It has the form: A uCt(a, u ) where Cl is arithmetic. By 3.3321 

Prov [n, r —rVaAuOt(a, u)" 1 ] -4 — tVocAuCI(ol } u) 

is proved by induction on *o applied to a particular formula G(a). 
But from A ud(a } u) together with (provable) transfinite induction 
on < applied to a suitable Q'(a), we have induction on € 0 , and hence 
{Aud(a, u) A Prov [ft, r —rVaAu&(a, w) 1 ]} -4— rWaAuCl(a f u ), that 
is, Prov [n, r -rVaAw(i(a, w)” 1 ] -4 -fVaAw©(a, u). But then, by 
3.233, we have a proof of -rAwCfc(a, w). (There is a corresponding 
argument for the cut free system, only now eo-induction is needed 
for a formal proof of closure under cut.) 

It is essential to demand that t.i. on a < y be proved for formulae 
containing free function variables since there are orderings which 
are not even well orderings, yet t.i. can be proved for all arithmetic 
predicates Q. 

3.343. Cut elimination and consistency proofs. 3.3321 makes it 
clear that the foundational significance of consistency proofs by 
transfinite induction depends on the justification of the latter, and 
3.331 suggests that this justification should be in terms of the way 
the ordering is built up. The following quite trivial remarks sum¬ 
marize the formal situation. To fix ideas, consider formal systems 
of analysis. 

Consistency can be proved by t.i. on a primitive recursive well 
ordering of ordinal w, simply because it is an universal statement. 
[AuA(u) can be proved by t.i. on the following ordering: x < y if 
either x < y a (A u < x)A(u) or y < x a -t(Aw < y)A(u).] If we 
take the union of all primitive recursive orderings which can be 
proved to be o> orderings in say analysis, the consistency can be 
proved by induction on this w 2 ordering. For every finite sub¬ 
system can be proved to be consistent, and so, by induction on one 
of the orderings in the sequence. 

The situation is quite different if one examines cut elimination 
itself. Suppose we demand 3.33 (a) and the validity of the rules 
to be proved in elementary analysis, and instead of (b) merely that 
the proof figures be recursive and well founded. Then the bar 
theorem (2.623) even for recursive R does not have a cut free proof. 
Since 3.33 (a) was essential to the more interesting applications of 
cut elimination, a generalization of the notion of cut elimination 
for analysis has quite a different character [72]. 
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3.35. Other Approaches. The use of finite types (2.321) has been 
described at length in the previous section. It can be taken as a 
justification of transfinite induction up to eo or else be regarded 
computationally and reduced to t.i. (2.35). 

3.351. Another method, particularly natural for the classical 
system, is Hilbert's ^substitution method, where quantifiers are 
replaced by so-called e-symbols. A very elegant and mathe¬ 
matically satisfying treatment is given in [68]: it presents a reformu¬ 
lation parallel to 3.24. Hilbert's substitution problem requires the 
replacement of e-terms by numbers to satisfy a finite set of purely 
numerical conditions; the possibility for doing this is ensured by the 
intended meaning. Hilbert introduced a particular sequence of 
approximations: to show that this converges is a problem, but, in 
the case of arithmetic, easily settled by nonconstructive con¬ 
tinuity considerations. In [68], the problem is generalized to a class 
of functional equations and one seeks to determine how the solutions 
depend on the parameters of these equations. The natural solution of 
this problem is constructive and makes the appearance of e-numbers 
very clear. 

Actually, in [17] a sequence of approximations was defined for 
the case of analysis. The proof of convergence given was defective. 
It does not seem to be known (even nonconstructively) if the method 
converges at all. 

3.352. By extension of 3.351, there is a direct treatment of the 
no-counterexample interpretation for classical arithmetic [69], 
In fact, to date it is the only detailed treatment for the assertions 
of 3.3411. It applies to the intuitionistic case since the negation 
of a prenex formula is provable in H if and only if it is provable 
classically (by 3.31 and 2.33). 

3.4. Finitist Proofs. Perhaps the most satisfactory kind of 
theory of finitist proof, or any other kind of reasoning, would take 
this form: properties of these proofs are formulated in terms of a 
basic conceptual framework, such as (some extension of) the 
theory 2.2, properties which are sufficient to characterize the notion 
or, at least, to decide specific questions about them. This would 
correspond to a definition of the class of hereditarily finite or 
hereditarily countable sets in abstract set theory (cf. also 1.32 or 
1.835). Such a theory about finitist proofs would be independent of 
cumbersome notation (cf. 3.4144). The present section, which 
gives a description of finitist proofs, is at the other end of the scale 
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and uses ad hoc devices. One reason is that the known abstract 
theories are not rich enough to express these devices. But a pos¬ 
sible independent interest of the present section is that its formal 
derivations reflect more closely the complexity of finitist proofs, in 
contrast to a theory about them. 

3.41. About finitist proofs. Explanations of the intended mean¬ 
ing are given on page 119 or 3.111 in terms of the distinction 
between concrete and abstract operations; less technically, one 
speaks of proofs that one can see or visualize . The latter is more 
vivid, but misleading unless one remembers that our primary sub¬ 
ject is a theoretical notion for the analysis of actual visualizing, not 
that experience itself. 40 

Warning . The name “finitist” was introduced by Hilbert 
because he thought that finiteness was the essential feature of the 
elementary kind of evidence (cf. 3.111) he had in mind. While it is 
clear that the property of proofs be stressed is significant, it should 
not be assumed that his analysis was correct. The analysis is not 
plausible: he wanted to go beyond old-fashioned formalism; for this 
it was essential to have general assertions like the commutative law, 
not confined to a finite domain, which are capable of elementary 
evidence. 

The formalisms below are quantifier-free systems with variables 
over natural numbers (n, m f p,q, . . .), constants for number 
theoretic functions (a, b, c, . . .) introduced by defining equations, 
and propositional combinations of equations. Also there are some 
essential auxiliary expressions (cf. 3.2341). Primitive recursive 
arithmetic (PR) is a simple example. 

3.411. The meaning of: a(n) = &(»). First of all, if such a state¬ 
ment is finitistically proved, it is true on the (corresponding) inter¬ 
pretation of Section 1; also it is constructively proved in the sense 
of Section 2. But these consequences are here purely incidental: 
the restriction to finitist reasoning is quite unnecessary for them, 

40 Perhaps, related to what can actually be visualized (by a particular person 
at a particular time), as the theoretical (mathematical) notion of: constructible 
by ruler and compass is related to: practically constructible by means of some 
particular ruler and compass, etc. Note at once: (a) The notion below is 
crude because it does not give a realistic measure of complexity for the (visual¬ 
ized) structures considered; it uses ordinals <eo' how much bigger is w"“ than 
a>"? (b) The work is about proofs concerning discrete structures, typically: 

arithmetic; whether evidence, that is, seeing, in geometric reasoning requires an 
essentially different analysis is left open. 
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and it is precisely this restriction which is our main subject. 
Instead, a(n) — b(n) expresses that one can (theoretically) visual¬ 
ize the whole computation procedures for o(0), a (SO), . . . and 
m, b(S0), . . . and convert them into one another by restructur¬ 
es- -A- justification of formal rules consists in convincing oneself 
that a formal proof of a(n) = b(n) enables one to do just this. 
Evidently, the use of combinators would be much more suitable 
than our use of equational systems; certainly, even the rule of 
substitution requires analysis for the present interpretation. 

3.412. Definition Principles. Recall the section on Church’s 
thesis. The analysis of computations makes it quite clear that all 
finitist definitions of number theoretic functions can be reduced to 
the form of recursion equations, that is, this: A propositional com¬ 
bination E(f, n) of equations built up from the function letter /, 
constants, and sets n of variables (»,,... n„) such that, for each 
numeral <S>”0, the value of f(S n 0) is finitely determined . 41 To be 
finitist this fact must be finitistically recognized. Suppose a col¬ 
lection 8 of such accepted equations together with the structure of 
all their computations has been visualized. This assumption is 
formally expressed by introducing a valuation function V g for terms 
built up from the symbols of S, with the obvious equations for V g 
(footnote 37): if a 0 is a constant in 8 with Godel number o 0 , and 
the term a 0 (S m O) has number s(a 0 , wi), the term a 0 (b) has number 
r (®o > b) then V £ [r(d 0 , 5)] = Vg{s[a 0 , V s (5)]}, and similarly for 
each equation in 8. 

3.413. Learning to Visualize. The basic equipment assumed is 
this. If one can visualize a structure, then also a sequence of a 
copies; more generally, if a is a function already introduced, and s a 
structure, the sequence s, a(s ), a[a(s)], ... can be visualized. 

vidently, it is understood that the latter is given as this sequence, 
and not merely in extension, since otherwise the previous exercise 
of visualizing s would not help. 

3.414. Iteration beyond w. For stating the facts we use nota- 
tions for ordinals <g 0 built up from 1, w , +, • and exponentiation; 
Greek letters f, ij, . . . are variables for them, and bold face for 
particular figures. The choice will explain itself; concerning the 
properties of this particular notation needed below, cf. 3.31. By 

41 For some substitution no of numerals and m, E(f, no) 1- f(S n 0) = iS”*0 
This latter relation is decidable for fixed n, m, no. The mere fact of finite 
determination is not characteristic for finitist rules but only for recursiveness. 
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above (3.413), if an iteration (of some operation) up to £ has been 
visualized, then it can also be visualized to £ • <o. So much is clear. 
Equally it is clear that it may so happen that iteration to each 
£ < £o can be visualized (with more and more effort!) but not £o 
itself. What one would like to say is this: if one sees how to visualize 
iteration up to each £ < £o> then itself can he visualized . The prob¬ 
lem is to put this into formal terms. Or, more vividly, can we find a 
formal expression L(£) such that a formal proof of L(%) by suitable 
rules enables us to visualize iteration up to £? 

3.4141. Throughout “iteration up to £” means iteration from 8 to 
V fi and primitive recursive definitions in V s (3.412). 

3.4142. As always in case of doubt, one has to treat positive and 
negative questions differently, namely that each 3; < e 0 can be 
theoretically visualized, and none beyond. The latter is to be done 
by finding suitable closure ( inaccessibility) properties of €<>. 

3.4143. We shall introduce a symbol (pseudoproperty) 0 applied 
only to particular figures 5, and not used for building up composite 
formulae. The intended meaning of 0(0 is: % can be visualized 
(as in 3.4141). Since the totality of all (concrete) structures 
which can be visualized, cannot be visualized, we do not use variable 
0 though bounded variables would be possible. More important, 
negations of (hence implications with the premise) 0(0j even for 
constant would refer again to a totality that cannot be visualized, 
and are therefore excluded altogether. 

3.41431. (Cf. footnote 40.) For a more realistic theory of actual 
practice the step from $; to % * <*> may have to be restricted for larger 
A little experimentation shows that even for relatively small \ 
it is not natural to visualize the configurations involved. Put 
differently: even when we could prove an assertion finitistically, 
in practice we do not. 

3.4144. It is possible that a more elegant theory would use logical 
connectives and quantifiers, an interpretation in terms of the basic 
concepts here used, and rules valid for it. It is not excluded that 
the simple theory 2.21 has this property, that is, is valid if one 
interprets 7 r [a, x * t>(x)] as: a is a finitist proof of v(x) = 0. But the 
matter is delicate, and so we avoid it altogether in the 

3.42. Two Formal Systems. The general idea is this: start with 
primitive recursive arithmetic, which is supposed to have been 
thought through finitistically. Therefore 3.412 can be applied, and 
this step is to be iterated. As we go on we carry along 0(£) to 
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indicate that ? has been visualized and so the step 3.412 is iterated 
? times. Symbols P { , resp. P K + are introduced provided 0 (£) has 
been established, and P t (n, to) is to express that n represents a 
finitist proof of the formula to. (Formula is one that does not con¬ 
tain 0 , pseudoformulae may contain 0 .) Some reasonable arith- 
metization of the syntax is assumed. 

Both systems contain the usual schemes of primitive recursive 
arithmetic, and a supply of function constants. (It will turn out 
that the P t are definable. But in the first place they are intended to 
be used in axioms about proofs by use of ^ iterations of 3.412.) 

Po is an explicit proof predicate for PR. 

0(a) is an axiom. And we infer 0( v • «) from 0(»), and for 
£ ■< V, 0(£) from O(jj). 

3.421. Suppose 0(£). Then P t (p, q) is equivalent to the dis¬ 
junction of conjunctions: 

3.4211. p = ( 17 , pj), j/ < £, P,(pi, q) (accumulation). 

3.4212. p = (pi, p.,), P^(pi, Qi), Pf(p 2 , qf), and q is a consequence 
of q 1 and q 2 in PR (validity of rules of PR). 

3.4213. p = <ij, px), n < £, q is a recursion equation for the valua¬ 
tion function of terms in P„ »/ -< £ (cf. 3.412). 

3.4214. (for handling 0), p = ( v , Vl ), v < £, q = 0(C) and 
P ilPh £ < C —»P -1 !«(£), r 0(£)' 1 }" 1 ] (with formal variable £; note 
that 0 occurs only with fixed arguments), and, of course, p = 
<Pi> f>, q - 0(< •»), P t (px, r 0(£)‘ l ). 

3.4214 is intended to express that one has finitistically recognized 
that each £ ■< C can be visualized. 

Finally: if P t (n, r A^), then A. 

3.422. It is clear that for just the ordinals tj < 0(rj) is 

provable in the system P^. 

3.423. In the second system condition 3.4214 is relaxed so that 
in the first place it is not even a formal system. 

P Z (P> q) -2 = 0(0 and for each 17 -< £, |- T 0(rf) is true for some 
r < £• 

(This is consistent with 3.414, because it is required that r ■< 0) 

3.43. Main Result. 0(0 both in 3.421 and 3.423 just for £ < e 0 . 
By 3.422, the same formulae are theorems of the two systems. 

3.44. Simplifications for mathematical practice (cf. 1.2 or footnote 
14).. While the detailed analysis of the process of finitist reasoning 
requires something like the hierarchy of proof predicates P^, in 
practice one wishes to suppress them. One way is this: by 3.43 
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and the basic results of 3.32, the system H (of intuitionistic) or, equi¬ 
valently, Z (of classical arithmetic) provides such a simplification. 

3.441. For each theorem A# VyA(x y y) (with A £ PR) of H or Z, 
there is a function constant a in 3.421 such that A[n y a(n)\ in 3.421. 
Conversely , for each constant a in 3.421 there is an explicit definition 
in H (for which the defining equations of a can be formally proved ). 

3.4411. The result extends immediately to conjunctions and dis¬ 
junctions of prenex theorems of H. These are the only formulae of 
H to which an immediate finitist sense is attached. For extensions 
to other formulae one needs interpretations, cf. 3.3411. 

3.442. Perhaps a more attractive formulation is this. One adds 
to PR free function variables and a constructive existential numerical 
quantifier with the obvious rules. Further, the inference rule, for 
relation constants R : 

If (Ex) ~ a(x + 1 )Ra(x) infer transfinite induction on R applied 
to existential formulae. 

This cannot be expressed in its full form: 

from Or)[Q/ R x)(Ez)A(y y z) * D (Ez)A(x y z)] infer (Ez)A(x y z), 

but instead, from 

(Ez\ } z<>){A[a(x ), z\] & [a(x)Rx\ & A[b(x , zi), z 2 ] 

& [b(x, z\)Rx ] } D (Ez)A (x, z) 

infer (Ez)A(x, z). 

However, the latter is sufficient for so-called nested recursion on R y 
and thus, by [68], one gets again to c 0 . 

3.4421. One would like to interpret the free function variables 
as variables over finitistic functions (not free choice sequences). 
The mere truth of (a)(Ex)[~a(x + l)Ra(x)] would not be sufficient 
to justify the inference; for, if finitistic functions are assumed to 
be in a recursively enumerable class E of recursive functions, there 
are R E such that (a £ E)(Ex)[~a(x + 1 )Re<*(x)] but the inference 
is false (for suitable R E ). On the basis of 3.42, we know that a 
finitistic proof of the premise is in fact sufficient for the conclusion, 
but there seems to be no direct way of seeing this. Cf. in 3.534 the 
direct justification of the hyperarithmetic comprehension rule. 

3 .45. We defer the discussion of open problems on finitist proofs 
to the end of the section because most of them also arise with 
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3.5. Predicative Proof. Consider the variant of the ramified 
hierarchy of 1.5 for definitions of sets of natural numbers only. 
The first step is this: given the collection N (of natural numbers), 
the collection D of formal definitions of subsets of N by means of 
the usual logical operations is well defined, and hence the cor¬ 
responding collection N\ of subsets of N defined by D . 42 The gen¬ 
eral step is to apply this process to a collection N$ instead of N. 
The formal expression of this idea is exactly parallel to the intro¬ 
duction of a valuation function in 3.412: here, given a collection Z), 
we introduce instead a truth definition T d for sentences (from which 
a satisfaction definition is derived by use of a substitution function 
s as in 3.412). 

Again the basic problem is: how often is this step to be iterated 
(cf. 1.52). There are some formal similarities to the iteration of 
3.414. But the basic difference is this: in the former, iteration has 
to be finitistically justified, here predicatively or: on the assumption 
that only the collection N is presupposed (reducibility to N). Dif¬ 
ferent formal axioms for (what corresponds to) 0 (£) (above) may 
be expected to follow from, that is, express the operational sig¬ 
nificance of, these two requirements. 

In 3.4 some attempt was made to say something about the 
intended meaning of finitistic justification (in terms of visualization 
of computations); in fact, if eo is going to be the bound, it is to be 
expected that all the technical apparatus is in 3.4 already: For 
predicative justification, the technical apparatus has only recently 
been created; it is described very fully and attractively in the recent 
[29]. It will be sufficient to report the results briefly. 

Here it should be remarked that there are also some formal dif¬ 
ferences between the two cases; for example, the step 3.414 is very 
sensitive to the description of S, as would be expected: it depends 
how we see the structure of computations, cf. [27], The present 
basic step not, as seen from the notation free formulation in 1 . 5 . 

3.51. A definition of: /3- numbers (abstract ordinals): 0-numbers 
are the powers of 2 ( x „° = 2“); for /3 ^ 0, /3-numbers a satisfy 
— a for all 7 < <S. r is the least ordinal y for which xo T = y. 

Existence of such ordinals follows from known results. For 
positive results, as in 3.31, it is necessary to have orderings of 
ordinals <r on which certain basic ordinal operations can be defined 


" the classical interpretation of the logical operations if the idea is that the 
collection N is given. 
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arithmetically and proved to satisfy their “recursive definitions,” 
cf. [29]. 

In one respect the present theory is easier than 3.4: there are 
meaningful statements of predicative mathematics which are quite 
“close” to the impredicative idea of: R is well founded, for instance: 
if TI(P, P)(TI for transfinite induction) stands for 

Yx {[(V yR x)P(y)] =*P(x)}=* YxP(x) 

and P* is a variable over sets of the hierarchy at level {, take 
VP*TI(P, P { ). Now, this only expresses TI for a restricted class of 
predicates, and so, as a premise of an implication , it does not have 
the full force of well foundedness. But used in a rule of inference, it 
does: for, a formal proof of even VP 1 !^/?, P 1 ) can be used as a 
schema of proofs for TI(P, (P) for an arbitrary (P [in this sense: sup¬ 
pose the highest level occurring in (P is 17 , and the proof of 
VP 1 TI(P, P 1 ) assumes the existence of £ levels; then the existence 
of 7 j © { levels ensures TI(P, (P)]. 

3.52. Main Result. For a natural formal theory of the ramified 
hierarchy T $, when T$ is used only if YP 1 TI(£, P 1 ) has been proved 
in T v for some ij < £: The length of the hierarchy is precisely T [29]. 

3.521. The conclusion is shown to be insensitive to the exact 
formulation of the idea of well ordering. But more important 

3.522. The bound r is unaffected if the rules of proof in T { are 
enlarged; namely: 17 applications of the nonconstructive (jy-rule are 
permitted provided YP 1 TI( 7 ?, P 1 ) has been proved . 

3.523. r is also the precise bound if one uses instead of the 
ramified hierarchy of subsets of N the more elegant full hierarchy 
of 1 . 5 , with the corresponding autonomy condition. 

3.53. Simplifications. One of the most satisfying results of [29] 
is a beautifully elegant equivalent formulation IR (inductive recur¬ 
sion) of the theory U T 7 ,; (IR) is an ordinary second order theory 

*<r 

(in which the ramification is suppressed), (a) a subsystem of clas¬ 
sical analysis, but (b): For each theorem of (IR) there is an 17 < T, 
such that the theorem holds in N n . In other words, by (a) all 
theorems of IR are valid for classical analysis , but also by (b) justified 
on the assumption of N only. It can be verified that the natural 
formalization of the bulk of classical analysis only uses the axioms 
of IR. 

It should be noted that the essential axiom of IR rs the rule of 
inference corresponding to the implication of 2.623 (bar theorem). 
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3.531. We give here a formulation for intuitionistic logic which is 
proof theoretically equivalent to IR. (It is slightly smoother than 
IR itself because we can use the unrestricted axiom of choice 
instead of special definition principles; cf., for example, 2.421.) 
This has some independent interest (2.423); but note that, for the 
interpretation of Section 2, this theory does not have predicative 
character at all, because the intuitionistic logical operators are 
themselves defined self-reflexively. 

Add to first order arithmetic H (2.412) predicate variables S, 
T (for species of natural numbers), and the usual formation rules. 
Extend the schemata of H to the new formalism. [$(#) is not 
assumed decidable: in contrast to the case of function variables in 
H ! or H \] 

3.5311. The axiom of choice (n)(ES) A(n, S) D (ES\)(n) Ai(n, S{) 
for arbitrary formulae A , where Ai(n y Si) expresses that Si is a 
choice set. 

3.5312. A definition principle: hyperarithmetic comprehension rule . 
Suppose A and B contain no predicate quantifiers, but possibly 
several free predicate variables; also suppose for the formula P(n) 
the following conditions are proved 

(ES)A(n, S ) D P(n), (ET)B(n, T ) D — P(n), and 

~r-r[(ES) A(n, S) v (ET)B(n, T)]. 

Then we introduce the axiom, if U does not occur in P(n), 

— (EU)(n)[U(n) <-> P(n)]. 

3.5313. An inference principle: suppose the formula R is decidable 
[R(n, m) v—rR(n, m)] and for predicate variable C/:TI(P, TJ) is 
proved. Then, for arbitrary formulae A y TI(P, A) may be inferred. 

3.5314. In contrast to 3.4421, these axioms express some evident 
properties of the intuitionistic ramified hierarchy of sets of 
natural numbers. (The argument for the classical case is parallel.) 
The assertion is that any theorem of the system holds when the 
predicate variables are interpreted to range over some suitable 
I v and beyond. 3.5311 is clear since if the premise is proved for I v> 
the conclusion for /, + i. Suppose the premises of 3.5312 are proved 
for /$, { > it] then (a) — *[(ES)A(n, S ) *->—r(ET)B(n, T)}, that is, 
for S f T ranging over (£ > i?); (b) ( ES v )A(n , S ) D (ES{)A(n, S). 
(c) ^(ETi)B(n, T) D —r{ET v )B(n y P). Hence -r^[(ES^)A(n,S) 
<r+ (E*S,)A(n, #S)] and, by the conditions on P, —^—r[P { (n) 
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P v (n)]. Take P„(ra) for U . Finally, the inference principle is 
justified by using the formal proof of TI(P, U) as a schema (cf. 3.51) 
for proving TI(P, A). Note that these considerations would not 
apply to the corresponding axioms instead of rules (cf. 3.2341). 

3.6. Consequences of 3.4 for Hilbert's Original Finitist Program , 
of 3.5 for the Analogous Predicative or Nominalist 43 Program . 

3.61. On the positive side. The basic steps in the hierarchies 
3.4-3.5 are clear, and “practical” proofs [in full arithmetic Z, resp. 
in (IR)] use only a few of the permitted rules. So, if they are 
formalized at all in these systems, they can be reduced to the first 
few steps of the corresponding hierarchy, and therefore are unaffected 
by the problem of iteration . 

3.62. On the negative side. It is certainly true that (a) in crude 
fmitism one contemplated only proofs that are immediately for- 
malizable in 3.42, particularly 3.423, and (b) in crude predicativity, 
proofs that are obviously formalizable in 3.52, particularly 3.522. 
So, at least as originally conceived, Hilbert’s program cannot be 
carried out. 44 (Specifically, in 3.42 one cannot prove the con¬ 
sistency of first order arithmetic H } in 3.52 one cannot prove the 
consistency of the bar theorem 2.63; further, more conclusively, 
in 3.423 one cannot prove the consistency of H; of course, all true 
arithmetic statements can be proved in 3.522, but there are state¬ 
ments TI (R, P) which cart be proved by means of the bar theorem but 
not in 3.522.) 

3.621. The only support for taking e 0j resp. T as bounds is 
empirical*, different formulations lead to the same ordinal; and for 
€ 0 , r as upper bounds that the bounds are not extended by replacing 
some of the rules by their nonconstructive versions. This kind of 
support is subject to the criticism of 2.73. Formally speaking, 
the difficulty is this: for example, all axioms for P* in 3.4 assert 
interrelations or the inference of a positive statement about P*, in 
other words, just closure conditions on the class of finitist proofs. 

43 Again in a suitable theoretical sense, as in footnote 40. There is some mar¬ 
ginal literature on a curious empirical nominalism, where mathematical asser¬ 
tions are to be interpreted in terms of material inscriptions and their (spatio- 
temporal) relations. But the work of 3.5 seems a more fruitful development of 
the ideas implicit in crude (nonconstructive) nominalism. 

44 Note that it was not clearly understood till [32] that the then current finitistic 
proofs could be easily formalized in Z , for example, the use of nested recursion 
or equations involving higher type variables. 
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This compares badly with the axiomatic theory of validity in 1.8 
where, for example, 1.833 allows the refutation of a statement about 
validity by proving a statement not involving this notion. Non- 
technically, one can express it this way: what “more” do I know 
about an identity a(n) — b{n) if I know that it is finitistically 
proved and not merely true? If one could express convincingly this 
extra information (even partially) in ordinary arithmetic terms, one 
would have the analogue to 1.833. All we offer is: formal prova¬ 
bility in 3.42, and the completeness of these rules for finitistic 
provability is certainly not immediately convincing; cf. 1.832. It 
may be remarked that the completeness of the usual rules of logic 
with respect to logical validity was certainly practically certain for 
a long time, but a satisfactory theoretical foundation required that 
the known facts were pulled together and formulated in the basic 
completeness theorem (1.834). It is the analogue to this last step 
that is badly missing in 3.4. (For 3.5 the situation is better because 
predicative provability certainly implies truth in the hierarchy up 
to «i, cf. 1.533: this allows one to be sure that, for example, the 
theorem of Cantor-Bendixson cannot be proved predicatively [29], 
cf. 2.63.) 

3.63. To avoid misunderstanding, two points should be noted. 

3.631. Both €o and r are “finally” reached as the limit of an o> 

sequence, for example, co — lim co n , where wi = w, w n+ i = w"\ 
“Since <o sequences are permitted and each co n is permitted, why not 
c 0 ?” One must not forget what is involved in the assertion that 
co n is permitted, namely: <o n _i iterations of the basic principle of 
3.4 were needed to introduce w n ; so steps are needed 

to introduce the sequence o> n , that is, steps. One has to know 
that «o itself is permitted before it can be seen to be reached by the 
basic step of 3.4. Similarly for T: a typical impredicativity (relative 
to the basic steps considered). 

3.632. Both in 3.4 and 3.5 much larger orderings than F can be 
defined, which are, impredicatively speaking, well orderings. (In 
fact, the same bound wi for both by [65].) Briefly, these are: well- 
founded finitist, resp. predicative orderings. What was analyzed 
above were the notions of: finitist , resp. predicative well orderings. 

3.6321. Recall 1.533: If, in the construction of the hierarchy, Tj is 
introduced provided for some rj < £, there is a well-founded re¬ 
ordering of ordinal £ (not necessarily: a TVwell ordering) the catch¬ 
ing point is : so coi “corresponds” to T. It would be interesting 
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to know what ordinal corresponds to eo, if the step 3.412 is continued 
for suitable natural (recall [27]!) well orderings, perhaps in the sense 
of footnote 38. 

4. IMPREDICATIVE (FULL) CLASSICAL ANALYSIS 

A problem in formal proof theory (mainly for specialists). In 
the present section are collected a number of essentially equivalent 
formulations of classical analysis, which were obtained in the course 
of extending the interpretation 2.32 to full impredicative analysis. 45 
They have some technical interest. But, more important, they 
provide basic problems on constructive functions of finite type which 
test, and thereby further, our understanding of these objects. The 
properties needed in Section 2 were too elementary for ambiguities, 
for example, those of 2.331, to matter. 

Throughout this section lower-case letters denote numerical 
variables, capitals denote variables over sets of natural numbers, 
a, p denote sequences of numbers or type r objects. 

4.1. Basic Systems for Classical Analysis ([6] or [66]); all systems 
considered contain Peano’s axioms. 

4.11. The full comprehension schema: 

ASVTAx[x G A(x, S )] 

for arbitrary formulae A , not containing T, and classical logic. 

4.12. The Principle of Dependent Choices (The strongest form of 
the axiom of choice expressible in the present notation): 

A <xVPA(a } 0)-> 

AaVPiAx{p(<O f x>) - a(x)AA[\yP(<x,y» , \yp(<x + l,y>)]}. 

The principle implies classically the comprehension scheme [66]. 
It is not implied by the latter except when A in prenex form contains 
no universal set quantifier followed by an existential one. For 
independence [51], for exception [61]. However, this axiom of 
choice holds in ramified classical analysis, the analogue to 1.5, and 
thus 4.12 is finitistically reducible to 4.11. 

4.13. The formal rules of intuitionistic logic are used instead of 

46 More precisely, to the system 4.13, 4.2211 resp. based on the formal rules of 
intuitionistic logic. 
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classical logic, the unrestricted axiom of choice, and any of the 
following schemata (for all A): 

4.131. x[A(x) vA(x)\ or [Ax—T—rA(z)] —> —r AxA(x) 

or —?[—r AxA(x) a Ax—*—tA(x)] } [46], [66]. 

N.B. The axioms 4.131 are not valid for the interpretation of 
the logical connectives in 2.31. [Also one may add the continuity 
axioms (2.511).] 

4.131 is formally a subsystem of 4.12. But 3.31 applied to 
analysis reduces 4.11 to 4.131. The reduction to 4.11 of the 
continuity axioms in addition to 4.131 is in [8] and alternatively 
in 4.223 below. 

4.2. Now various formal extensions of the bar theorem 2.6231 are 
used instead of the comprehension scheme. They are auxiliaries 
toward the main result of this section. 

4.21. First, systems with classical logic. 

4.211. Add 2.6231 with classical logic. The derivation from 
the comprehension principle is standard, for the converse cf. [41]. 

4.212. First extend the language of 4.11 by adding symbols for 
functions of finite type, formally exactly as in [34] (cf. 2.22). Let 
a be a variable of type r°, ft a variable over finite sequences of type r 
object (by some suitable coding of finite sequences), u a type r 
variable. The formal generalization of 2.6231 is 

4.2121. Aa V xR(a, x) a Axya/3{[R(a, x) a x < y • —> R(a, y )] a 
[R(a, x) Q(ax)] a [A uQ{& * u) • -* Q(0)]} •—> A0Q(/3). 

To get a model for this in classical analysis, one first interprets 
the higher type variables by means of neighborhood functions, as 
in 2.422. Since, here, A aVxR(ct, x) expresses that a partial 
ordering of functions is well founded, the obvious proof of (the 
translation of) 4.2121 (in terms of neighborhood functions) needs 
the principle of dependent choices. So, as in 4.12, one uses the 
model of 4.1 provided by ramified classical analysis where an explicit 
well ordering of functions can be defined, as on page 105. 

The converse is more surprising. The full axiom of choice (4.12) 
follows from 4.2121 by classical logic and the axiom of choice 
restricted to quantifier free A as in 2.3332. Further, 4.2121 need 
only be applied to purely existential formulae R. (This restriction 
will be used immediately.) [41.] 

4.22. Now, intuitionistic logic, and quantifier free systems resp. 

4.221. Intuitionistic logic of order co, primitive recursive functions 

of order co [34] (cf. 2.23), the axiom of choice, 4.2121, and the con- 
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tinuity axioms for objects b of type 0 (r0) , a, of type r° 

AaVx A0[(A z < x)(az = j 3z) —> ba = 6jSJ. 

4.2211. (For completeness:) 4.221 and continuity for all for¬ 
mulae A 

Aa VxA(a, x) —► Aa Vx V^A0[( Az < y)(az = fa) A(/3, x)]. 

4.221 is reduced to classical analysis because all the axioms are 
satisfied by the model of 4.212. The converse, and the reduction of 
4.2211, are given in 

4.222. A quantifier free system of finite types: The system of 
2.23, that is, of [34], is extended by the following definitional scheme 
(bar recursion of finite type). Let c be a variable over finite 
sequences of type r objects, lh(c) its length, Y (the principal varia¬ 
ble) of type 0 (r#) , [c] the sequence (type r° object) defined by the 
condition that its first lh(c) members are c , the rest some fixed 
constant, and, the types of the variables a, b y are chosen so as to 
be coherent. For all types considered constants 0 and axioms 

0(a, b y c; Y) = a{c) if Y([c]) < lh(c ); 

= &(At*0(a, by c * Uy Y), c) otherwise; 

are added, where u is a type r variable. 

These axioms are formally derivable from 4.221 and hence 
reducible to classical analysis. 

4.2221. (An auxiliary system!) Add to 4.221 the schema 4.2121 
with Aa^Vx^(a, x) instead of A aVxR{a } x) for purely existen¬ 
tial Ry that is, R{a y x) has the form V6 r J R i (a, x, 6), with quantifier- 
free # 1 . 

Main Result. 4.2221 reduces to 4.222 when the interpretation of 
[34], that is, 2.321, is applied. 

More precisely, if A is a formula of classical analysis, A i the 
interpretation by 2.31 of the negative form of A as defined in 2.3322, 
then A can be proved from 4.12 if and only if Ax can be proved in 
4.222. 

Corollary. Classical analysis reduces to 4.222, that is, there is 
a finitist consistency proof of classical analysis relative to 4.222. 

For, by 4.212 (converse), classical analysis is essentially a sub¬ 
system of 4.2121 with purely existential R. By 3.31, this is reduced 
to intuitionistic logic and negative forms of the axioms, that is, 
4.2221 instead of 4.221. But, by 2.322, on the interpretation 2.321, 
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4.2221 is equivalent to 4.2121 itself. This reduction by easy stages 
with interesting lemmas was given in [41] after a more complicated 
proof in [66]. 

4.223. If one wants to include 4.2211 one has to use the modified 
neighborhood functions of 2.422. 

4.3. For what notions of function of finite type is bar recursion 
(4.222) validt Not much seems to be known constructively without 
additional hypotheses. If Church’s thesis is assumed the following 
cases (cf. 2.73) are settled: 

4.31. Bar recursion at type 0, that is, for c in 4.222 ranging over 
finite sequences of numbers (type 0 objects), is false for effective 
operations. 

4.32. For continuous functionals we have 

4.321. (even without Church’s thesis) bar recursion at types 0 
and 1 can be proved, for example, in 2.622, and 6 has a recursive 
neighborhood function, 

4.322. bar recursion at type 2 is false if both Church’s thesis 
and the axiom 2.521 are assumed. 

N.B. the contrast with the classical theory of neighborhood 
functions (4.2121); in the proof of 4,322, Church’s thesis dis- 
tinguishes between, and 2.521 connects free choice sequences and 
constructive functions. It is open whether the thesis can be 
replaced by (2.531 and 2.511). 

4.33. For the schemata of [44], a definition of 6 can be written 
down, but there are no evident axioms for functionals of finite type 
to prove that 8 is everywhere defined. 

So the question 4.3 is wide open. 

4.4. Discussion . Practically speaking, the results 4.31 and 4.32 
are useful because, at the present time, we have no evident axioms 
for constructive functions of finite type which are not satisfied by 
the particular kinds of functionals considered in 4.3. But their 
significance for the proof theoretic study of classical analysis is 
strictly limited, even if we restrict ourselves to the particular idea 
of [46] which underlies the present section, namely this. The 
interpretation 2.32 can be formally applied to 4.131, which is 
interreducible with classical analysis; and one then asks: what 
plausible definition schemta are needed to satisfy (the interpretation 
of) the axioms 4.131? 

We consider particularly the continuous case 4.32. 
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4.41. Continuity properties are not formally necessary because, 
by 2.3331, the axioms 4.131 are satisfied if arbitrary nonconstructive 
functionals are used. 

4.42. More interesting, the analysis of [66, page 17] shows: it is 
sufficient (though not necessary) to solve the following 

Problem. Consider the finite type structure over N (the set of 
natural numbers) and classes C r (of operations of each type) satisfy¬ 
ing the following conditions: (a) They are closed under the primitive 
recursive operations of 2.23. (b) For C of type <r° , Y of type \ 

A, B, D with types that make sense: There should be functions B 
and C of A, D, Y in U C r *which solve the functional equations 

r 

A(YC, B) = C(YC) and DC = B[C(YC)\. 

One way of satisfying (b) is this: XJ C T should contain the 0 of 4.222 

T 

for all possible types, but, by 4.41, it is not the only way. A purely 
abstract study of the problem may be fruitful. 

4.43. Finally, tricky definitions of classes of functionals aside 
(which satisfy 4.3), how promising is the general idea above in the 
light of present knowledge? That is, are the (negative) formal 
theorems of classical analysis valid on the interpretation 2.32 for the 
standard notion of functional of finite type? 

4.431. Note first, that on 2.523, negative formulae have the same 
meaning whether type 0° variables are interpreted as constructive 
functions or as free choice sequences. So, if the interpretation 2.32 
is faithful to 2.523, it is not reasonable to impose continuity require¬ 
ments on the functionals used unless free choice sequences are to be 
involved in proofs even of formulae not containing free choice variables 
(cf. pages 133-134). 

4.432. Next, consider 2.32 applied to the simplest nontrivial 
case of 4.131, where A{x) is VyB(x, y) with quantifier-free B: 

Ax [Vy£(x, i/) v —’ V yB(x, y)], that is, ---- Ax V y Az[B(x, y) v 
—rB(x,z)]. (This is intuitionistically equivalent, by use of the 
axiom of choice, to 4.11 with purely existential A.) On 2.32, writ¬ 
ing C{x, y, z ) for B(x, y)s,—rB{x, z ) 

AxVyAzC(x,y,z ) means (Ea)(xz)C ( x , ax, z ) 

—>Ax\lyAzC{x, y, z ) means (£xV)(a)C[x 2 o, a(x 2 a), z 2 a]. 

If this is to be established without dubious hypotheses on con- 
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structive functions a, x 2 and z 2 must be defined for free choice 
sequences, that is, have strong continuity properties. 

^ 4.4321. -*—> t\xV y f\zC{x, y, z) means (x 2 z 2 )(Ea)C[x\ a(x 2 a), 
z a]; but if 4.4321 is to be established without dubious hypotheses 
it must not be assumed that x 2 and z 2 are necessarily defined for all 
free choice sequences. With this assumption, 4.4321 can be proved 
(even for arithmetic B), for example, in 2.622 by [66J, but not with¬ 
out: specifically, by [46], 6.2 and 7.2, not for the extension of 
Church’s thesis in [44], 


6. ABOUT FOUNDATIONAL PROBLEMS 

^ We now review briefly the relevance of the technical results of 
Sections 1—4 to the general problem on page 95. 

5.1. (Hilbert’s version of) Formalism . Recall the introduction 
and Section 3. The phenomena of reasoning are represented by 
combinatorial relations between formal expressions (words or 
formulae), so to speak, their outward and visible signs. Assertions 
which present themselves to us as being about abstract situations, 
are analyzed in terms of formal derivability of their representations. 
Only the manipulation of the formal expressions is counted as 
immediate experience. The fact that at least a partial representa¬ 
tion, that is, a descriptive scheme, can be obtained in this way by 
use of simple mechanical rules is the discovery of formalization. 
These rules represent the principles of reasoning (in the domain of 
mathematics considered). The mere existence of a formalization 
which generates all (or some) accepted assertions does not explain or, 
as one says: justify the principles, in particular, it does not explain 
their application inside or outside mathematics (cf. page 150). 
Hilbert s program of consistency proofs or, more generally, inter¬ 
pretations gives a precise and coherent scheme for such an explana¬ 
tion in terms of a reduction to certain principles of reasoning, namely 
combinatorial proofs. These are a minimum used in all scientific 
reasoning (cf. footnote 31 for their relation to a .mechanistic 
conception). 

Modulo the open questions of 3.6 concerning combinatorial proofs, 
the reduction cannot be carried out for a well-established part of 
current mathematics, namely first order classical or intuitionistic 
arithmetic. And, for a wider version of Hilbert’s program (pre- 
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dicative proofs), it cannot be carried out for current intuitionistic 
analysis and, a fortiori, not for classical analysis (2.63, 3.5). 

For the areas where it can be carried out (3.3232) it provides a 
remarkable separation between (a) the reliability of combinatorial 
conclusions in reasoning which presents itself to us as being about 
certain abstract set theoretic or constructive objects, and (b) the 
abstract existential assumptions made in such reasoning. For 
wider areas this kind of separation is not possible (1.711). 

The machinery of formal systems plays not only the basic role 
(formalization) described above, but a high degree of technical 
development of this machinery is used in an essential way. In 
particular, rules are needed in place of axioms for formalizing the 
concept of finitist proof; and this formalization, in turn, is needed 
for establishing the limitations on Hilbert's original program. 

For reference below, note three things. 

5.11. Godel's incompleteness theorem is of central importance 
for the formalist conception. 

5.12. Doctrinaire formalism. Why do we not regard the reason¬ 
ing of classical number theory as simply unjustified, instead of 
regarding 3.4 as a proof of the limitations of Hilbert's program. 
Here it is to be remembered that the purpose of theory is not only to 
organize a limited set of data (here: baby arithmetic) or even to 
interpret and correct them, but, above all, to extend the range of 
experience which can be theoretically understood . From this point 
of view Hilbert's program, by itself, was always defective because 
it could justify principles, but not explain the choice of rules. 

5.13. The refutation of formalism illustrates the well-known 
fact that one has to go beyond narrow experience (for example, of 
school mathematics for which Hilbert's program can be carried out 
remarkably well) to get decisive results on a theory. This is only 
reasonable because a theory would not be expected to survive if it 
were in conflict with very crude experience! 46 

5.2. Realism and set theory. Here the emphasis is not on the 
process of reasoning, but on the results and, in particular, on the 
objects about which assertions are made. In consequence, if 
formalization is used as the descriptive scheme for reasoning, the 
justification of the rules consists in showing that the conclusions 

46 Galileo's first theory of falling bodies (velocity proportional to the distance 
fallen) did not survive long because according to it a body at rest would never 
start to fall (s = Ae Bt ) and this conflicts with very crude observation! 
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are valid, for example, 1.831, 1.835 for pure logic. While in 1.8 the 
rules themselves were not problematic, in the untutored reasoning 
about sets they were, and led to contradictions. Closer examina¬ 
tion of what objects sets are (cumulative type structure) explains 
the restrictions on the rules. This is why 1.2 was regarded as a 
main result. As a theory, set theory is extremely attractive because 
it has few primitives, and other mathematical objects (natural 
numbers, real numbers, probabilities) are built up naturally: if it 
was an achievement to build up the physical world from a 100-odd 
atoms, how much more striking to build up the world of mathe¬ 
matics from two primitives: Furthermore, the laws for these 
primitives are elegant and surprisingly clear (1.67). 

More generally, the realist conception is certainly very close 
to the way a good deal of mathematics presents itself to us, and 
(this was of course its prime reason) it explains the objectivity of 
mathematics, that is, agreement on results, by its being about exter¬ 
nal objects with which we are in some kind of contact. As has 
been pointed out in [33, 35], there is considerable similarity in the 
methods of acquiring knowledge in elementary mathematics and 
physics. Also, it would be agreed that the realist assumption of 
external mathematical objects is not more dubious than that of 
physical objects. 

The weakness lies elsewhere. Basically, the trouble seems to be 
that the realist view has never been developed far enough to be put 
to a real test, in sharp contrast to (Hilbert’s) formalism; cf. foot¬ 
note 13. More specifically, it will be very interesting to see what 
considerations will settle the question of measurable ordinals (1.72, 
1.733). I do not know a formulation of the realist view, for which 
experience establishes that there are infinite sets (in the sense of set 
theory), let alone inaccessible ones. Concerning the relation to 
physical objects it is certainly unusual in physics to regard the 
existence of objects as established because we can think of them, 
especially in advanced physics, and, by 5.13, one should perhaps 
not restrict oneself to the elementary case. 

The important connection between set theoretic and arithmetic 
statements discovered in [32] (1.711) also differs in a fundamental 
respect from so-called observable consequences of physical hypothe¬ 
ses. Thus, if arithmetic statements are correlated with observable 
consequences [33], in some cases (on the basis of the known theory) 
the set theoretic hypotheses have no observable consequences 
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(1.535, 1.65) among them the true axiom of choice and the dubious 
continuum hypothesis. And in others, for example, existence of 
inaccessible ordinals, rival hypotheses are not in conflict, but one 
leaves a particular arithmetic question undecided while the other 
does not. By 5.12, this may be in support of the latter; 
but it seems to be unusual in the natural sciences to have 
rival theories where one knows that they will not conflict on 
observable consequences. 

Perhaps one has stressed too much how “daring” and problematic 
the realist view is, instead of developing it so as to permit some 
conclusive decision. In the light of what has happened in the case 
of formalism a decision may be possible. For comparison note 

5.21. Godel’s incompleteness theorem does not conflict with the 
realist view. In fact, if the subject matter is regarded as objective, 
one has no right to expect an intelligible theory, least of all a com¬ 
plete one in the form of a formal system! Further one has no 
reason to expect that the primitive objects can be grasped clearly. 

5.22. While the 'particular realist conception formulated in the 
set theory of Zermelo-Frankel extends the range of theoretical 
understanding, for example, to establishing the consistency of the 
rules of classical analysis (and more), the realist view tout court does 
not : recall the remark above on whether there are infinitely large 
sets. In other words, it is not clear whether, on this view as 
presently developed , more or less formal reasoning is justified than 
on 5.1. 

5.3. Idealism and Intuitionistic Mathematics . The idealist con¬ 
ception is probably the most commonplace one at the present time: 
mathematics is a free mental activity. As such it does not exclude 
the existence of mathematical objects external to us; they would 
have the same structure as the ideas involved, and so mathematics 
would be about them too. Certainly, at an elementary level, one 
does not ask oneself whether, for example, simple arithmetic state¬ 
ments are about concrete realizations (finite sets of things), one's 
ideas of such configurations, or about some abstract entities. Fur¬ 
ther, even if one admits abstract objects external to us, at a certain 
stage it would be quite in accordance with scientific practice to 
ignore them if the available information is dubious and confusing. 
This is done by the realist, too, who certainly (properly) ignores 
questions about the organization of the brain without denying that 
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this may be involved in a more delicate study of our relation to 
these external objects. 

In this connection, as least as far as questions of evidence are con¬ 
cerned, even the particular restriction of the idealist conception dis¬ 
cussed in Section 2, may recommend itself. Here (solipsism) one 
confines oneself to thoughts which are again about the mathematical 
activity itself and not about external objects. After all, in the 
process of reasoning as it presents itself to us, we benefit from teach¬ 
ing up to a certain elementary level, but after that understanding 
seems to be very much every man’s own business. And if under¬ 
standing is regarded as something that happens here and now, one 
adopts in effect the crude solipsist position. More pedantically, 
one seeks a theory of evidence whose parameters are those of the 
solipsist position. In short, there seems to be nothing outlandish 
in the general position. 

The weakness of the general idealist position, as in 5.2, is rather 
that it is not sharp enough. In particular, I know of no formulation 
which excludes taking the whole of formal set theory as laws of our 
ideas of collections existing independently of us (whether or not they 
do). In fact, here it would be natural to take set theory as a proper 
part of a more general idealistic conception which deals not only 
with properties that define collections, but more general properties 
and to consider their laws (logic). For such a logic of not every¬ 
where defined properties, the law of the excluded middle would not 
be valid. The notion of collecting property could then be defined, 
for example, as in [14], where a is collecting if for each <, [- < G a or 
(- < G a - This suggests the possibility of & foundation of the theory 
of sets in terms of more general notions, but to date the axioms for 
such a notion are weak. 

5.31. More generally, the primitive concepts of an idealist concep¬ 
tion would not enter directly into mathematical practice, but would be 
used for an analysis, that is, for foundations. See footnote 14, but 
also 3.31 where classical arithmetic concepts may be regarded as 
forming a crude substructure in which the delicate properties of 
(constructive) (Ex) and v are suppressed. 

5.311. A foundation along the lines of 5.31 is a kind of analogue 
to Hilbert’s program, but with this essential difference: unlike the 
latter it would not reject any classical statements (as ideal 3.113: 
why should an idealistic conception reject ideal elements?) but 
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rather study statements about ideas which themselves are not neces¬ 
sarily about objects external to us. The execution of this kind of 
program may permit the outright definition of the (logical and non- 
logical) primitives of classical mathematics; just as the primitives 
of arithmetic are defined in set theory. (Outright, and not only 
contextually, in contrast to the elimination of free choice sequences 

in 2.523 or 2.622.) . 

5.32. Concerning the sharper version of idealism (solipsism) 
developed in intuitionistic mathematics, the main results of Section 
2, summarized in 2.63, show that the currently justified formal rules 
cover roughly the part of analysis done at the turn of the century. 
Comparison with set theory is difficult because both the analysis 
of the basic notions of intuitionism along 2.2 and the theory of 
constructions of higher type in 2.8 are still quite crude. Not 
only technically, but also conceptually since the notion of set in 
the sense of the cumulative theory of types is certainly clearer than 
that of construction or proof. 

5.321. It is not yet excluded that the bulk of classical mathe¬ 
matics can be interpreted in suitably sophisticated versions of 2.2 
or 2.8. This would correspond to Hilbert’s program with: intuition¬ 
istic proofs replacing combinatorial ones. As pointed out in 3.12(c), 
the interpretation problem is here more appropriate than the 
consistency problem since presumably at least all statements 
VaRxA(a,x) for elementary A would be regarded as significant. 
A whole section (Section 4) was devoted to proof theory of lmpre- 
dicative analysis in view of the possibility just mentioned. Here, 
again, it is to be stressed that the proposal is not to use the intuition¬ 
istic theories for mathematical practice, but for its foundation; both 
conceptually, and for a technical analysis of the power of particular 
axiomatic theories. This technical use has been shown already 
both in standard proof theory of arithmetic (3.241) and, somewhat 
less directly (cf. footnote 9), of set theory. 

5.322. There is work in the literature in a direction opposite 
to 5.321, namely to try and understand intuitionistic practice in 
terms of set theoretic concept's. For elementary parts, cf. [11]; 
for more highly developed areas, the results of [8] and the elimina¬ 
tion results of Section 2 (2.6241) are important. Here abstract 
theories which use the concept of proof, as in extensions of 2.21, 
present difficulties. But while this work is often useful or even 
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essential for understanding the formal character of problems in 
intuitionistic mathematics, it has no direct interest for the idealist 
conception of mathematics. 

5.323. Godel’s incompleteness theorem does not conflict directly 
with it, because there is no assumption that the laws of thought are 
mechanical (in the sense of: recursive). Cf. footnote 18 and 2.72. 

5.324. The idealist conception is not antiphysical. In fact, there 
seems to be no reason why the facts of mathematical experience 
might not lead to general conclusions about physical laws (foot¬ 
note 30). 


5.4. Idealism: problems. From the discussion above (5.3, 5.323, 
5.324) it seems difficult to refute the idealist position by actual conflict 
with experience, or support it by establishing such conflict for its 
rival views. While such simple criteria of success have been suffi¬ 
cient to decide between proposed theories in many areas of the 
natural sciences, different criteria may have to be applied here. (Or, 
in hackneyed language, the “operational significance” of the idealist 
position must be sought elsewhere.) 

_ Briefly, cf. page 98 of the introduction, what characterizes the 
difference between e.g. the idealist and realist view is what aspects 
of (crude) experience are regarded as significant and suitable for study. 
Thus, in particular, the realist would hardly deny the fact of a great 
deal of agreement on various (intensional) properties of proof as in 
footnote 15: rather, he would regard it as insignificant for mathe¬ 
matics, and (probably) as not capable of precise theoretical treat¬ 
ment of any kind. So, a basic step towards a decision on the ideal¬ 
ist position is this: 

5.41. Can one develop a theory of such intensional properties? 
One would see the strength of the idealist position in its furnishing 
notions and laws in terms of which a wider area of experience can be 
interpreted (just as in 5.12). Some case of this kind has been made 
above (5.321) for the intuitionistic concepts. Evidently, not all 
experience that presents itself to us is suitable: just as in the 
natural sciences many (perhaps most!) aspects of experience, 
though perfectly objective, are taken as physically insignificant’ 
and one does not try to set up a theory to explain them. For 
example, experience of particular, so-called accidental, facts. 
Therefore an essential step in the development (of the idealist 
position) is to select an area of experience and set oneself specific 
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tasks: just as for progress in pure mathematics it is essential to 
have mathematical problems (only now the mathematical formu¬ 
lation of the task is part of the solution, not given with the problem). 

5.411. One such task, likely to be fruitful for the development of 
the idealist view, is the interpretation problem for classical mathe¬ 
matics (5.31 or 5.321). 

5.412. But, perhaps, a more basic one is this, suggested by 3.4 
(despite the still fragmentary character of this work): to set up a 
theory for the detailed structure of proofs , not only for the class of 
provable theorems , in short, characterizations of informal notions 
of proof. Some essential difficulties are mentioned in 3.6, but 
autonomous progressions [28] are certainly a useful intermediate 
step, at least for some of these informal notions. Experience sug¬ 
gests that there are relatively few such notions that are, as one says, 
basic in our mathematical experience, so that this area lends itself 
to theoretical analysis. At any rate (so it seems to me) the essence 
of the idealistic conception consists precisely in the assumption that 
this impression is correct. 

5.42. Granted all this, two questions arise: is mathematics 
needed in this subject? Is a (primarily formal) mathematical 
study promising? 

5.421. The need for an advanced mathematical treatment seems 
clear because the properties of proofs which are significant in reason¬ 
ing, are complicated: so one needs highly developed technical 
machinery even to state approximate laws. 

5.422. It would, perhaps, be merely frivolous to study extremely 
primitive elements of reasoning, like recognition of symbols, with¬ 
out regard for the physical processes involved. But, by 5.3 (page 
187), this criticism should not apply to the areas of reasoning here 
considered. After all, a good deal of physical theory of matter in 
bulk is quite independent of understanding the atomic structure of 
matter (in fact, the theory would never have come about if one had 
first waited for such understanding): only relatively crude quali¬ 
tative physical experience was used to good effect in early physical 
theory. An analogue to this is needed in the case of reasoning if a 
primarily mathematical analysis is to be promising. 

These lectures were prepared during a stay at the Institute for 
Advanced Study at Princeton, for which I am much indebted to 
Professors Godel and Oppenheimer. It is a pleasure to acknowledge 
that my conversations with Professor Godel (during this stay and 
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on previous occasions) have helped me much, not least, to see some 
fruitful consequences of the general philosophic points made in his 
[33, 34, 35]. References in the text give a good idea of these conse¬ 
quences just because the views are sharp, and not merely suggestive 
in a literary way. But, I hope, also those parts of the lectures 
which are quite independent or, as in Sections 2 and 5, contrary 
to the general points mentioned, have been sharpened by the 
conversations. 
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4 

Some Recent Advances 
and Current Problems 
in Number Theory 

Paul Erdos 


The subject of number theory is very extensive and has intimate 
links with other branches of mathematics. Analysis has been 
applied with great success to various problems of number theory 
for 200 years and there is every reason to expect that such applica¬ 
tions will continue. Algebraic methods have also been applied to 
number theory and have in turn developed out of number theory. 
Recently algebraic geometry and probability theory have been 
applied with success to problems which previously seemed intracta¬ 
ble. In this chapter I clearly cannot hope even to attempt to give a 
complete survey of recent developments in number theory, and in 
quite a few of its branches I am not particularly competent to do 
so—for example, in the branches involving algebraic geometry. 
My paper will be highly subjective; I shall write mainly about 
questions which have interested me personally, and I certainly do 
not wish to suggest that any problems and results which I omit to 
mention are less important or interesting than the ones I shall write 
about a great deal. For instance, I overemphasize problems on 
primes and problems of a combinatorial type; also, of course, I over¬ 
emphasize my own work. I shall not write much about Waring’s 
196 
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problem since it has been dealt with in recent books [1]; I shall omit 
the geometry of numbers, and also Diophantine approximation 
since I recently wrote about this subject (see my forthcoming 
paper in Compositio Math.). The same fate will overtake many 
applications of probability to number theory, but several survey 
articles have appeared recently on this subject (some of them written 
by me) and there is also a recent book by Kubilius and a forthcom¬ 
ing book by R6nyi and myself [2], Most of the questions with 
which I shall deal will have a combinatorial flavor or will relate to 
primes (or both); these are the subjects which have interested me 
most for the last thirty-three years. To quote from the introduc¬ 
tion to the well-known and excellent book of Hardy and Wright: 
“I cannot fail completely in making the paper interesting, since 
the subject is so attractive that this would need extravagant 
incompetence.” 

There will be some overlap between this paper and my recent 
paper “On unsolved problems” [2a]. 

I wish to thank my friends Davenport, Schinzel, and Turan for 
their valuable assistance. 

1. First, I shall discuss problems and results on the distribution 
of prime numbers (the letters p, q will denote primes throughout). 

Denote by tt(x) the number of primes not exceeding x and by c, 
ci, . . . absolute constants, not always the same. The Prime 
Number Theorem states that 


( 1 ) 



x — « X 


log X 


(1) was first proved in 1896 by Hadamard and de la Valine Poussin. 
In 1948 Selberg and I obtained [3] an elementary proof of (1). Our 
starting point was the following remarkable formula of Selberg, 
which he proved in an elementary way: 

If d(x) = ^ log p then 

p<x 

(2) d(x) log x + l log p # ^ = ^ x l°g x “h 0(x). 

P<X 

We then proved by elementary arguments that if 1 < pi < < 
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... is any sequence of real numbers which satisfies (2) and further 
satisfies 

(3) *(*) > ax, ^ - log * + 0(1), 

p<X 

then 

(4) &(x) = x + o(x). 

This is well known to be equivalent to (1). 

Beurling [4] gave the following interesting generalization of the 
prime number theorem. Let 1 < pi < P 2 < * * * be any sequence 
of real numbers, which will be called generalized primes. Denote 
by N(x) the number of solutions of 

II Vi ai < x (at -0,1,.. .)• 

t 

Assume that 


K(x) -x+o 


Then it follows that 


2 ‘- 


(1 + 0 ( 1 )) 


Beurling also observed that (6) cannot be deduced from a weaker 
error term than that in (5). Selberg observed that one can make 
the deduction of (6) from (5) by our elementary method if in (5) a 
slightly better error term is assumed (unpublished). Nyman and 
Malliavin [4] sharpened Beurling’s results in various ways. I later 
proved [3] that (2) alone implies (4) and Shapiro [3] proved that 
(2) with an error term o(x logx) instead of 0(x) also implies (4). 

My deduction of (4) from (2) was based on the following Tauber- 
ian theorem, which seems of independent interest: 

n 

Suppose ajc > 0 and assume that (with s n — ^ a *) 

A —1 
n 

^ a*(s n _* + k) = n 2 + 0(n). 


( 7 ) 




Some Recent Advances and Current Problems in Number Theory 199 


Then 


(8) s„ - » + 0(1). 

My original proof was very complicated; Siegel simplified it (unpub¬ 
lished) and later Shapiro [3] simplified my proof considerably. 

Bombieri and Wirsing [5] succeeded independently in proving in 
an elementary way that for every k > 0 

(9) #(x) - * + 

This, as is well known, implies that 



This is a considerable advance on previous results of Van der 
Corput, Kuhn, Breusch, and others. 

In my opinion the simplest deduction of (4) from (2) and (3) 
is that due to V. Nevanlinna [6], who somewhat simplifies the proof 
of Wright [1]. 

Put \l/(x) - ^ A(n), A(n) = log n if n = p a and is 0 otherwise. 

n<x 

Tchebicheff observed that 


( 10 ) 


t'CD - * 

» — 1 


log x — x + o(a;) 


The proof of (10) is elementary; in fact (10) easily follows from a 
weak form of Stirling’s formula. 

' It would be very desirable to deduce the prime number theorem 
from (10) as far as possible in an elementary way. Sharpening a 
previous result of Landau, Ingham [7] proved by using Wiener’s 
theory that (10) implies }(x) =.x + o(x) which is equivalent to 
the prime number theorem. Recently Ingham and I proved the 
following theorem (our paper will appear in Acta Arilhmetica): 

Let 1 < ai < a 2 • • • be a sequence of real numbers with ^ 1/a,- < oo. 

» 

Assume that /(x) is an increasing function for which 

ai) **>+2 / ©'*( i+ 2£) + ‘ ,< * ) ' 
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(11) implies/(:c) = x + o(x) if and only if ^ 1/(a*)® 


1 has no root 


of the forms 1 + it, t ^ 0. It is possible that if the a* are integers 
this condition is always satisfied. The simplest case for which we 
cannot decide this question is a\ = 2 , a2 = 3, 03 = 5 . Several 
related problems are discussed in our paper. 

The sharpest estimation of &(x) is at present 


d(x) — x + 0(x exp (— (log x) H *)), 


obtained by Korobov and Vinogradoff [8]. The Riemann Hypoth¬ 
esis would imply 

&(x) — x + 0(x 54 log x) 


Now I go on to state some problems and results about the dis¬ 
tribution of primes. Let 2 — p x < p% < * * ‘be the sequence of 
consecutive primes. A well-known theorem of Tchebicheff states 
that Pn+i < 2p n for all n, and the prime number theorem implies 
that Pn+i/Vn —► 1- The sharpest upper bound for p n +i — p n = d n 
is due to Haneke [9]; he proved, sharpening previous results of 
Hoheisel, Haneke, Heilbronn, Ingham, and Min, that 

d n < Pn*‘ + ‘. 


The Riemann Hypothesis would imply 

d n < 


It has been conjectured that between two consecutive squares 
there is always a prime. This conjecture can probably not be 
deduced from the Riemann Hypothesis and seems to be very deep. 
Piltz conjectured that for every e > 0 and n > «o(c) 


d n < ri*. 

Cramer [10] conjectured that 


( 12 ) 


lim 


dn 

(log n) 2 


= 1 . 


Cramer was lead to his conjecture by probabilistic reasoning; the 
proof or disproof of (12) seems hopeless by the methods which are 
at our disposal at present. It has been known for a very long time 
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that lim d n = » (the n — 1 integers n\ + 2, n! + 3, . . . , n\ + n 
are all composite). 

Sharpening previous results of Backlund, Brauer-Zeitz, and 
Westzynthius, I proved [11] by using Brun’s method that for 
infinitely many n 

(13) d n > c log n log log n/(log log log «) 2 . 

Chang [11] succeeded by a simple further idea in dispensing with 
Bran's method in the proof of (13), and Rankin using an improve¬ 
ment of our method proved that for infinitely many n and every 
« > 0 

(14) d n > (i — e) log n log log n log log log log n/(log log log n) 2 . 

(14) seems to be the natural boundary of our method; the only 
improvement of (14) in the last 26 years is due to Schonhage and 
Rankin, who replaced the constant i by e T [11]. 

A well-known and probably very difficult conjecture on primes 
asserts that 

(15) ir(x + y) < %{x) + ir{y). 


Hardy and Littlewood [12] proved by Bran’s method that 


(16) 


+ 9) " < ii7 


As far as I know this is the only time Hardy and Littlewood used 
Bran’s method. Selberg [12] improved (16) to 


(17) 


v(x + y) — t(x) < 


2 V 

logy 


+ 0 


y log log y \ 
. (logy) 2 / 


It would be very important if one would replace 2 in (17) by a 
smaller constant, but this seems to be difficult. A slightly weaker 
conjecture than (15) states that, for every « and y > yo(«), 


+ y) ~ ’Kz) < 


(1 + e)y 
logy 


Selberg’s investigations [12] on the limits of the efficiency of the 
sieve methods indicate that (17) cannot be improved by Bran’s 
method except possibly if very essential changes are made in the 
estimation of the error terms. 
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It seems likely that d n / log n is everywhere dense in (0, oo). 
Ricci [13] and I proved independently that the limit points of 
d n f log n form a set of positive Lebesgue measure, but the only 
known limit point of this set is oo . In particular I do not know 
if d n /log n has a rational limit point. It seems certain that d n / log n 
has a continuous distribution function In fact, Bombieri con¬ 
jectures that yf/{t) = 1 — e~\ in other words, the density of integers 
n for which d n < t log n equals 1 — e~K There does not seem to be 
much hope of attacking this conjecture at present, but later in this 
paper I will state a modification which can probably be settled. 

Ricci [13] proved by Bran's method that the lower density of 
the integers n for which d n < log n is positive, and in fact it is not 
hard to show that there is an e > 0 such that the lower density of 
the integers n for which d n < (1 - € ) log n is also positive. Unfor¬ 
tunately I cannot prove that the upper density of the integers n 
for which 

d n > log n 

is positive. It is not hard to deduce from Bran's method that 

2' d n > cx , 

where the dash indicates that the summation is extended over those 
Pn < x for which d n > log n. But unfortunately nothing can be 
deduced from this because of possible very large values of d n . In 
fact I just observe to my annoyance that I cannot show that 
dn/ lo& ^ has at least one finite limit point greater than or equal 
to 1. One could give by Bran's method a rough estimation for a 
constant c so that d n /log n certainly has a finite limit point > c, 
but as far as I know nobody has given an explicit value for c. 

I proved [14] that 

lim d n /log n < 1 

(the prime number theorem immediately implies that the lim is 
:< 1). Rankin [14] proved that the limit in question is < #£;l!tlcci 
[13] showed that it is < and finally Bombieri proved that it is 
< ff (unpublished). There seems to be no doubt that the lim in 
question is 0, but this seems very hard to prove; the well-known 
conjecture that there are infinitely many prime twins, that is, that 
dn = 2 has infinitely many solutions, would of course imply this. 

The sequence d n , n = 1, 2, ... , behaves very irregularly. 
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Tur&n and I [15] proved that the inequalities d n +i > d n and 
d n+1 < d n both have infinitely many solutions. We have not been 
able to prove that d n > d n+i > d n+2 or d n < d n+x < d n + 2 have 
infinitely many solutions. In fact we cannot disprove the existence 
of an integer n 0 so that for every k > 0, d no + 2k > d no + 2k + 1 
and d n9 + 2k + 1 < d no + 2k + 2. It is not known that d n = d n+ i 
has infinitely many solutions; R6nyi and I [15] proved that the 
number of solutions of d n = d n + 1 ,1 < n < x is less than or/(log x )**; 
very likely the true order is cx/ log x but this seems difficult. 

It is not difficult to show that for infinitely many indices n, d n > 
d»+i, d n > d n ^i and for infinitely many indices m, d m < d m + 1 , 
dm ^ dm— 1. 

Sierpinski [16] observed that 
(18) lim min (d„, d n+ i) = ». 


It is perhaps surprising that though the proof of lim d n = oo is 
trivial the proof of (18) is much more difficult. 

Walfisz and Prachar [16] proved that the upper density of the 
integers n for which 

min (d„, d n+h . . . , d n +fc) < € log n 

tends to 0 for fixed k together with € (I slightly modified their result). 
Also I observed [16] that for every c x > 0 there exists a c 2 > 0 so 
that there are at least c 2 log n consecutive values <4, . . . , 

(k < n), all of which are > c\, but I do not know if this holds for 
every and c 2 . R6nyi and I [15] also showed that 

n 

c\n V' c 2 n log logn 
log n Lj d k log n 

k = 1 


probably the upper boimd is nearly best possible. 
Prachar and I [17] proved that 


d(log s) 2 < 


X 

p k <x 


Jc +1 


< C2(log x) 2 . 


We further showed that if fc,- is a subsequence of the integers for 
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which 

Pki ^ Pkj+i ) 
hi fct+x 

then the density of this sequence is 0 . 

Put pk/h — ujc. We further observed that the prime number 
theorem implies, for Jc > h 0 (e), 

(19) ^[*<1+01 > 

on the other hand to every l there are infinitely many values of h 
for which 

( 20 ) Uk > Uk+i. 

There is a big gap between (19) and (20) which we cannot fill. 
In our paper we state the following further problems on the struc¬ 
ture of the sequence u 

There are only a finite number of values of Jc (possibly none) for 
which 


max Uk-i < Uh < min Uk+i. 

\<i<k i<*<« 

We easily show that the density of the integers k for which 
Uk > Uk +1 is positive. We cannot show that the same holds for 
the k for which Uk < Uk+v 

We do not know if Uk < < Uk +2 or Uk > Uk+i > w * + 2 has 

infinitely many solutions. 

Returning to the question of d n , I may mention that by using 
Bran’s method I proved [18] that 

r • ($nt ^n-f-l) 

lim mm —--— — 00 

log n 

but I cannot prove that 

i* (d»j — . (dnj 2 ) 

lim max — -— < 1 or lim mm -— 1 — — = 00 , 

— log n log n 

also I cannot prove that 


lim 


d n + 


+ 1 


k log n 


where c does not depend on k . 


< 1 - c 
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It seems certain that the density of the indices n for which 
d n > d n+ i is *, but this seems very hard to prove. I proved [19] 
though that for a sufficiently small « > 0 the lower density of the 
indices n for which d n > (1 + «)d n +i is positive, and the same result 
holds for (1 + t)d n < d n +\. 

Define n\ < < • • • as follows: 

d nt > d n for all n < n,-. 

Very little is known about the sequence n,-, for example, I cannot 
prove that n<+i > m + 1 for i > l 0 . It is easy though to see that 
the density of the n< is 0. 

Cramer [10] proved, assuming the Riemann hypothesis, that 
^ d n 2 < cx(log x) 4 . 

n<x 

Very probably 

£ d n 2 < cx(log x) 2 , 

n<x 

but this seems hopeless. It may even be true that 

< 2i) 

n^x 

It is not hard to prove that the lower limit in (21) is positive. 

Similar questions can be asked for other sequences of numbers. 
For example, let Si < s 2 < • • * be the sequence of squarefree 
numbers; it is well known and easy to prove that their density is 
6 /tt 2 . I proved [20] that 

^ (sf+i — Si) 2 — ctfi + o(n ). 

*i<n 

It seems very probable that for every a > 0 
(22) ^ (s»+i - s,)“ = c*n + o(n). 

«»<n 

(22) if true must be very difficult, since it would imply s 1+ i — s,- = 
o(s,') for every « > 0. My method breaks down for a > 2 but it 
proves (22) for a < 2. The best upper bound for s, +1 — s,- is 
due to Richert [20] who proved (sharpening a previous result of 
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K. F. Roth) 

8, +1 — Si < CSi* 4 log Si. 

It is easy to prove [20] that 

«r 2 

(23) s, +1 — Si > (1 + o(l)) — log *</log log Si, 

D 

but as far as I know nobody has succeeded in replacing 1 + o(l) by 
1 + c in (23). 

Denote by Q(x) the number of squarefree integers not exceeding 
x. It is easy to prove that 

(24) Q(x) = x + 0(x H ), 

and the prime number theorem gives o(x H ) in (24). One would 
expect the Riemann Hypothesis to give o(x H+e ), but it seems that 
one can only deduce o(z H+< ). The true order of magnitude of the 
error term in (24) is unknown. 

One could try to generalize (22) as follows. Let a± < a 2 < • • • 
be an infinite sequence of integers satisfying a^/k 2 —> oo and let 
bi < i >2 < * • * be the sequence of integers no one of which is a 
multiple of any a. Is it then true that 

(25) J - 5.) 2 = Cn + o(»)? 

bi<n 

If «i = p 2 we obtain (22). If instead of dk/k 2 —* oo only a* < 
ck 2 is assumed it is easy to see that (25) cannot hold; at present I 
cannot disprove that in this case 

T (bi+x - bi ) 2 < An 

bi<n 


remains true for a suitable A . 

In [14] I conjectured that if 1 = a x < a 2 < • * • < a^(„> = 
n — 1 are the integers relatively prime to n then 



This conjecture seems to be an elementary version of (21) and should 
not be too difficult to prove. 
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Hooley [20] in fact proved that for every a < 2 

•e(n) — 1 

V\ c a n a 

4 (o.+i-«.)“< 

<-i 

and 

X V* . , log log » 

<“*« - <m - 

*-l 

Put n fc = 2, 3, . . . p*. The *>(n fc ) integers relatively prime to 
n k in the interval (1, n k ) might be expected to show a somewhat 
similar behavior to the primes. Let 

1 = oi ( *> < o 2 (k) < • • • < o,( Bt) (fc) - n k - 1 

be the integers relatively prime to n k . Let us investigate to what 
extent this sequence satisfies the conjectures we stated about 
primes. First of all, it is not hard to deduce from Brun’s method 
that there are constants ci and c 2 such that every interval of length 
ci(log n) c ' contains an integer relatively prime to n k . 

A theorem of Mertens implies that 

n k = (1 + o(l))e~ ir = (1 + o(l}£2 
<p(n k ) log log n k log k 

It is not hard to prove that (if - a, (t) = d< (fc) ) the sequence 

log k 

is everywhere dense in (0, 00 ) j in other words to every e and ij there 
is a fco such that for k > k 0 every interval of length n in («, 1/e) con¬ 
tains a number of the form d/*^/log k. I have not been able to 
prove that 

log k 

has a distribution function (the precise meaning of this statement 
is obvious and is left to the reader). 

It seems probable that the number of integers 1 < s < <(>{n k ) for 
which > di w , is [* + o(l)M»*)> but as far as I know this has 
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not been proved. I do not know the number of solutions of = 
d£ k) , but it would be easy to obtain crude upper and lower bounds. 
It is easy to see that for every t and all k > k 0 (t) 

di m > d«\ > • * > 4 *?^ 

is solvable. 

Sivasankaranarayana Pillai conjectured that 

(27) ^ d " = I* + «(!)]*• 

pn<X 

«®0(mod 2) 

(27) seems very hard to prove; Brun's method easily gives 

X d n > cx. 

Pn<X 

n*sO(mod 2) 

One can also conjecture that 

(28) X d. (fc) = li + o(l)]»t, 

t«0(mod 2) 

but I have not been able to prove this. 

Jacobsthal defines g(n) to be the least integer such that among 
any g(n) consecutive integers there is at least one relatively prime 
to n. Put 

max 0 ( 71 ) = C(r) + 1, 

where the maximum is taken over all the integers n with v(n) < 
r (where v(n) denotes the number of distinct prime factors of n). 
We have 


(29) 


Cir(log r) 2 log log log r 
(log log r) 2 


< C(r) < c 2 r c *. 


The left side of (29) follows from (13) and the right side can be 
easily obtained by Brun's method. Jacobsthal conjectured that 

(30) C(r) < c 4 r 2 . 

The exponent in (29) can be reduced by Selberg’s improvement of 
Brun's method, but (30) seems hopeless at present [21]. 

Now we discuss primes in arithmetic progressions. Dirichlet was 
the first to prove that every arithmetic progression [a + kd\ with 
(a, d) = 1 represents infinitely many primes. Many mathema¬ 
ticians attempted without success to find an elementary proof, but 
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finally Selberg [ 22 ] was successful. Denote by 7 r(a, d, x) the number 
of primes < x of the form a + kd. The prime number theorem for 
arithmetic progressions states that 

(31) + (iltad, *-.»>• 

It is not difficult to prove (31) by the method of Selberg and myself 
[22], The generalized Riemann Hypothesis for L-functions would 
imply that 

ir(a, d,x) = A- f A*- + 0(x H log x) 

<p(d) J 2 log y 

uniformly in d, and also that the least prime p(a, d) in a + kd is 
less than d 2+ *. 

Linnik [23] proved without using any hypothesis that 
p(a } d) < C\d e2 . 

Linnik’s proof has been simplified first by Rodosskij and still 
further recently by Turdn and Knapowski [22]. 

Tur&n [24] proved using the generalized Riemann Hypothesis that 
for all but o[<p(d)] arithmetic progressions a + kd 

(32) p(a, d) < cd{ log d) 2+i . 

Perhaps the exponent 2 + e can be replaced by 1 + e but this is 
very deep if true. I proved [24] using Bran’s method that for 
every c\ > 0 

p(a, d) < C\<p(d) log d 

for at least C 2 <p(d) [where C 2 = C 2 (ci)] values of a. In the opposite 
direction, I could only show that there exists a constant c 3 and an 
infinite sequence d\ < d^ < * * * such that 

( 33 ) p(a, di) > (1 + ci)^(di) log d { 

for at least c&{di) values of a. There seems no doubt that this 
holds for all sufficiently large d, but I could not prove it. The 
proof of (33) used Bran’s methods and thus strongly used special 
properties of primes. Perhaps the following general result holds: 
Let ai < a 2 < * * * be a sequence of integers for which 
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Denote by /(a, d , n) the smallest « d(mod n). Then there is an 
infinite sequence n\ < n 2 < • • • so that, for at least cin* values of 
d in 0 < d < n if 


(34) /(a, d } rij) > (1 + c 2 )Ui log n*. 

Perhaps (34) holds for all sufficiently large n. 

Using the results of [24] Tur&n proved that for every irrational 
a > 1 the sequence pa (mod 1) is uniformly distributed; later 
Vinogradov [24] proved this without any hypothesis, and by using 
his powerful methods of estimating trigonometric sums, he also 
obtains a fairly good estimation of the discrepancy of the sequence 
pa (mod 1). It follows easily from the uniformity of distribution 
that for every irrational a > 1, [na] = p has infinitely many solu¬ 
tions. As far as I know it is not known whether there are infinitely 
many primes p for which [pa] = q . 

Now I want to say something about the comparative theory of 
prime numbers; a subject recently developed by Tur&n and Kna- 
powski [25]. The origin of this subject is to be found in the follow¬ 
ing conjecture of Tchebicheff: put 

f(z) = £ 


Then Tchebicheff stated that f(x) —> — <*> as x tends to 0. This 
conjecture is still unproved and must be very deep since Hardy, 
Littlewood, and Landau [25] showed that it is equivalent to the 


Riemann Hypothesis for the L-function 1 — 

3* 5* 7* 


belonging to the modulus 4. 

Tchebicheff stated that his conjecture implies a preponderance 
of the primes = 3 (mod 4) over those = l(mod 4). Littlewood [25] 
proved on the other hand that 


?r(l, 4, x) — tf(3, 4, x) 
changes sign infinitely often. 

Turdn and Knapowski [25] recently took up this subject and 
obtained a whole series of interesting results which seemed unat¬ 
tainable previously. I just state here a few of them and must 
refer to their joint papers and to their forthcoming book on this 
subject. 
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A modulus d is called good if the L-functions L(s, x) belonging to 
this modulus d have no real root in the critical strip. The values 
3 < d < 12 are all good and possibly every modulus is good, but 
it is not even known that there are infinitely many good moduli. 

1 . If d is good and T is sufficiently large, then the interval 1 
(logs T, T) always contains an xi and an x 2 for which for any l with 
(l, d) = 1 and l ^ l(mod d), 

tt(1, d, xi) - *(l, d, xi) > logs xi, 

K y/xi . 

t( 1 , d, x 2 ) — d, x 2 ) < ^ logs x 2 ; 

further *( 1 , d, x) - r(l, d, x) has at least c log 4 T changes of sign in 
(0, T). 

2. For good d the interval (log 3 T, T) always contains an Xi and 
x 2 for which 

w(xi) y/x [, 

t( 1 , d, xi) - - 77 T > :-logs *i, 

v ’ ’ <p(d ) log x\ 

. 7 r(x 2 ) y/~x 2 . 

3 . If d is good and T > T 0 and l is a quadratic residue (mod d) 
then the interval (T H , T) contains values xi and x 2 for which 

u / log T log 3 T\ 

*■(!» d, Xi) — t( l, d, xi) > T exp ^ log^T / 

. —14 ( log T logs T \ 

7 r(l, d, x 2 ) — ir(Z, d, x 2 ) < T exp ^ j 0 g 2 f ) 


4. if d is good and T > T 0 then for every two distinct values 
£1 and I2 

1 A (n) — ^ A(n) 

n-Ii(modd) n-l*(mod d) 

n<x n <> x 

changes sign in (T, e WT ). 

1 We write log 2 for log log, and so on. 
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5. Knapowski [26] proved that for sufficiently large T 

*<*> - /*A 

J 2 lOgV 


changes sign at least clog 4 T times in (0, T). (Riemann conjec¬ 
tured that 



for all x and Littlewood disproved this conjecture by showing that 


ir{x) 


_ r d v 

A logy 


Li(x) changes sign infinitely often [26].) 

Knapowski and Tur&n use the new and surprising inequalities of 
Turdn which he developed in several of his papers and in his book 
[25]. A new English edition of the book will soon appear and will 
contain many interesting problems and new results. The inequali¬ 
ties are analytic in nature but can also be considered as part of the 
theory of Diophantine approximation, and in a certain sense they 
can be considered as generalizations of Dirichlet’s theorem. Here 
I want to state only two problems in this theory, on which I also 
worked. 


Let zi = 1 and |z,-| < 1 for 2 < i < n. Put s* = ^ z*. Turdn 

conjectured that there exists an absolute constant c such that, for 
all n and all choices of the z’s, 

(35) max IsJ > c. 

l<k<n 

Atkinson [27] recently proved this conjecture, and in an unpub¬ 
lished manuscript he showed that c can be chosen to be -J. Turdn 
further conjectured that to every e there is an n 0 such that for n > n 0 

max |s*| > 1 — e . 
i<*<» 


I observed [25] that there is a constant c > 0 and a sequence with 
2 i = 1, |z,j < 1 for 2 < i < n such that 


max js 4 | < 

2<*<n+l 


l 

(1 + c)»‘ 


( 36 ) 
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The contrast between (35) and (36) is striking. I was unable to 
decide the existence of a sequence with \zi\ > 1 for 1 < i < n which 
satisfies (36). 

Very recently Turdn told me the following conjecture: Assume 
that the infinite sequence s k , 1 < A; < °o, contains infinitely many 
consecutive n — 1-tuples which are all 0. Then (essentially) 

(37) zj = c 2Hy/n , 1 < j < n. 

Perhaps (37) can be deduced if we know only that the sequence s k 
contains two consecutive n — 1-tuples which are 0. 

Before I leave the subject of prime numbers I would like to call 
attention to some related questions: E. Jabotinsky and I and 
independently and simultaneously V. Gardiner, R. Lazarus, N. 
Metropolis, and S. Ulam considered a modification of the sieve of 
Eratostenes and we were lead to several interesting questions, but 
for this I must refer to our papers on this subject [28]. 

Very interesting questions are raised in a paper by Hawkins on 
the so-called random sieve [29]; since this is perhaps not very well 
known I give the necessary definitions. We define a “random” 
sequence a%(f) as follows: Put Ui = 2, and cross out each integer 
3, 4, . . . with probability Let a 2 be the first integer which has 
not been crossed out. Then cross out each of the integers a 2 + 1, 
a 2 + 2, . . . with probability l/a 2 and let a 3 be the first integer 
not crossed out, then cross out each of the integers a 3 + 1, . . . 
with probability l/a 3 , and so on. Thus we obtain the “random” 
sequence a* = cti(0> ^nd Hawkins conjectured that for almost all t 
ail (i log i) 1, but as far as I know this has never been satis¬ 
factorily proved [29]. 

Finally, I would like to call attention to the following result: 
Let -pi = 3, p 2 = 5, and let p k be the smallest prime for which 

p k 7 ^ l(mod pi), 1 < i < k. 

Then I prove that 


lim ,7 z — r = 1. 
* = « k log k log log k 


The proof of (38) uses Tauberian arguments which are simpler than 
those used in the elementary proof of the prime number theorem [30]. 

6. Now I discuss some results in the arithmetic theory of poly¬ 
nomials. Let J{x) = a 0 x n + • • • + a„ be an irreducible poly- 
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nomial with integral coefficients. Denote by v(p) the number of 
solutions of the congruence 

f(x) = 0(mod p). 

The prime ideal theorem [22] states that 

(39) V „(p) = (1 + 0 (1)) • 

logo; 

P<X 

Shapiro [22] proved (39) by the method of Selberg and myself in 
an elementary way. The proof is elementary in the sense that it 
does not use function theory, but as in all the other proofs of (39) 
he has to use algebraic number theory, that is, the theory of ideals. 
It would of course be interesting to prove (39) without the use of 
ideal theory, but perhaps this is not possible. I often tried without 
success to prove without using ideal theory that 

l Vp = 00 • 

'(p)>0 

Knapowski and Tur&n (unpublished; see [25]) proved the follow¬ 
ing theorem: Suppose T > T 0 (f). Then there are four numbers 
Mi, W 2 , « 3 , «4 satisfying 

logs T < « 2 exp(-8(logM 2 ) H ) < ui < u 2 < T, 

log 3 T <u 4 exp(— 8 (log m) H ) < u z < u 4 < T, 

for which 


X ’ (p) - £ 


du Uz 


H 


ui<p<ut 


log U log «2 


X 


1l*<P<tt4 


du 
log u 


u 4 


H 


log u 4 


This theorem is new even for the case f(x) = x. 

Denote by v(m) the number of distinct prime factors of m. The 
prime ideal theorem immediately implies that 


^ — [1 + o(l)]:clog]og:r. 

n = 1 
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Tur&n [31] proved the following surprising result: Let h(n) tend 
to infinity together with n as slowly as we please; then the density 
of the integers n for which the inequality 

log log n — h(n )(log log n) w < i»[/(n)] < log log n 

+ A(n)(log log n) M 

does not hold is 0. The special case f(x) = z is a classical result of 
Hardy and Ramanujan [31]. 

Halberstam [31] proved that the density of integers n for which 
*[/(«)] < log log n + c(log log n) w 

equals j'* m edx . The special case/(x) = x is contained 

in a theorem of Kac and myself. 

I proved [31] that the number of primes p < x for which 

(40) (1 — c)log log p < v(p — 1) < (1 + €)log log p 

is not satisfied is o(x/logx). Halberstam [31] proved the follow¬ 
ing very much more general and more precise result. Suppose 
f(x) t* ex. Then the number of primes p < x for which 

v[/(p)] < log log V + c(log log p) H 

equals 

Denote by d(n) the number of divisors of n . Titchmarsh [31] 
proved that 

p<x 

I proved [31] using (40) that 

(41) 

V<x 

and Haselgrove [32] proved that 

p<x 
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Finally, Linnik [31], using his powerful new dispersion method, 
proved that 

Yt d( - p -1) = a+od)) *+<**>. 

p<x 

Van der Corput [32] proved that 

X 

C\X log x < ^ d[J(n)] < C2x(log x ) a . 

n = 1 


I proved [32] that 

X 

(42) y d[/(n)] < c 3 x log x 

ln~l 

The proof is elementary but not simple. Very likely 

X 

(43) ^ d[/(n)] = cx log x + o(x log x). 

n — 1 

If true (43) must be very hard to prove, since the prime factors 
greater than x make the sharp estimation of the sum (43) very 
difficult. The constant c in (43) will perhaps depend on the poly¬ 
nomial f(x). Bellmann and Shapiro [32] proved (43) if f(x) is of 
degree 2, and m this case c = 2. Recently Hooley [32] proved that 
if f(x) is of degree 2 then 

X 

y 4/(«)] = 2x log x + 0(x a ), a < 1. 

n==l 

Using Brun's method and the one with which I proved (42), I can 
show that 

X d[f(p)} < C 4 x. 

P<X 

Perhaps if/(x) ^ cx one can show by Linnik's method that 

2 d[f(p)] > c 6 x. 

Denote by P(n ) the greatest prime factor of n. Tchebicheff 
[33] proved that 


lim P 

fcc= 00 


X 

[ n a + « 2 )]a = 

n = l 
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Nagell and Ricci [33] proved that if f(x) is of degree greater than 
1 then 

X 

p [ n /w] > cx log X, 

n = 1 

and I [33] proved that 

X 

p[n /(»)] > Cl* (log*) 10 " 1 . 

n “ 1 

By more complicated methods I can prove that 

X 

( 44 ) p [ n /(»)] > CiX exp (log x) c \ 

n = 1 

I never published the proof of (44), which is fairly complicated. 
The proof could be simplified a great deal if I could prove the follow¬ 
ing purely combinatorial theorem [33]. To every ci there exists a c 2 
so that if A u . . . , Ai, where l = [a k ], are sets each having at 
most k elements, then there are ci of them Ai„ ... , A ici which 
have pairwise the same intersection. Rado and I [33] proved this 
with k\(ci - l) fc instead of [c 2 k ]. (44) seems to be the natural 

boundary of my method. Very likely 

* 

(45) p [ n /(")] > * 1+d . 

n —1 

but this seems very difficult. (44) would follow easily if we could 
prove that the number of integers n < x for which all prime factors 
of /(ft) are < x is greater than cx } but this has not even proved for 
f(x) = 1 + x 2 . 

It seems probable in fact that 

X 

p [ n /(»)] > c** 

71 ~ 1 

when f(x) is a polynomial of degree k. 

A well-known result of P61ya [34] states that if the degree of 
f(x) is > 1 then 


( 46 ) 


lim P[/(n)] = oo. 
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If f(x) = 1 + x 1 Mahler and Chowla [34] (independently) proved 
that 

(47) P[f(n)] > c log log n. 

(47) is certainly very far from being best possible, but I do not know 
of any reasonable upper bound for P[f(n)] which is valid for infinitely 
many n. [Added in proof: Schinzel just informed me that he showed 
that for every t there are infinitely many n for which P(n 2 + t) < 
exp (c log n/\o g log log ft).] 

Another result of P61ya [34], related to (41), states that if 
. . . , pk is any finite set of primes and a x < a 2 < • • • is the set 
of all integers composed of the p's, then — a t * tends to infinity. 
This was improved by Siegel [34] to 

(48) a,i+1 — ai > a; 1 ”* 

for every e > 0 if i > iote). It is easy to see that if k > 1 then 

(49) -* 1 

ai 

as i —■> oo. There is a gap between (48) and (49) which as far as I 
know has not yet been filled. 

Here I would like to mention a problem of Wintner which he 
communicated to me orally. Does there exist an infinite sequence 
of primes pi < P 2 < * * * such that if a± < «2 < • • • is the 
set of all the integers composed of the p's then 

(50) lim (a t *_j_i — a**) = oo ? 

i — » 

It seems certain that such a sequence exists, but I was unable to 
prove this. 

One final result about greatest prime factors. It is not difficult to 
prove that for every n 

P{2 n - 1) >n. 

Schinzel [34] recently proved that for n > 12 

(51) P(2 n - 1) > 2n. 

The proof of (51) is surprisingly complicated. Very likely 

lim P(2 n - 1 )/n = oo. 

»=*« 

As far as I know P(n\ -f* 1) has not yet been investigated. 
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The polynomial x 2 + x + 2 is even for all integral values of x, 
thus it cannot represent any odd prime. A well known elementary 
theorem states that if fix) is of degree k and /( x) = 0(mod n) for 
all integral values of x then n j A;!. One could conjecture that f(n) 
represents infinitely many integers of the form a • p where a | k\. 
As already stated Dirichlet proved this for polynomials of degree 1 in 
1837. But the conjecture has never been proved even for a single 
polynomial of degree greater than 1. The only result for expres¬ 
sions of degree greater than 1 is due to 1.1. Pjatezkij-Schapiro [35], 
He proved that the number of primes of the form [n c ] in 1 < n < x 
is (1 + o(l))x/(l + c) log x if 1 < c < . 

The result very likely holds for all nonintegral c > 1. 

Heilbronn [36] proved using Brun’s method that the number of 
integers n < x for which fin) is a prime is less than cn/log n; also 
it follows from Brun’s method that there is an absolute constant c h 
depending only on the degree of fix), such that fin) represents 
infinitely many integers having fewer than ci prime factors. 

Another conjecture states that f(x) represents infinitely many 
integers of the form aQ, where a| k\ and Q is squarefree. It is well 
known and easy to prove that fix) represents infinitely many kth 
power-free integers, and in fact the density of the integers n for 
which fin) is kth power-free is positive (k being the degree of f(x)). 

I proved [37] that if k > 2 then fix) represents infinitely many 
( k - l)-th power-free integers, the only exception being that if 
k = 2 l then it may happen that fin) = 0(mod 2 l ') for all n, but 
then fix) represents infinitely many integers of the form 2 Q where 
Q is odd and (fc - l)-th power-free. The proof is fairly com¬ 
plicated. We would expect that the density of the integers for 
which fin) is (fc - l)-th power-free is positive, but I could not prove 
this. I could prove nothing about the representation of (fc 2)-th 
power-free numbers, for example, I cannot show that n 4 + 2 repre¬ 
sents infinitely many squarefree numbers. 

As far as I know the question whether 2" ± 1 represents infinitely 
many kth power-free integers, is intractable at present, and the same 
is true for n! ± 1. 

7. Now I want to discuss a set of problems which could be said 
to belong to combinatorial number theory, that is, the questions 
have both number-theoretic and combinatorial character. These 
problems perhaps do not all have great importance but they are very 
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close to my heart (or rather I should say to my brain) since two of 
my main interests are number theory and combinatorial analysis. 

I will start with Van der Waerden’s theorem [38]. This asserts 
that if one splits the integers into two classes in any way then at 
least one of them contains an arbitrary long arithmetic progression. 
We shall be more concerned here with the finite form of this theorem, 
also proved by Van der Waerden: For every k there exists a smallest 
integer J(k) such that if we split the integers 1 < t < f(k) into two 
classes then at least one of them contains an arithmetic progression 
of k terms. The upper bound obtained for f(k) is enormously 
large; the reason for this is that all proofs use a double induction. 
Denote by f(k, l) the smallest integer such that if we split the inte¬ 
gers 1 < t < f(k } l) into l classes then at least one of them contains 
an arithmetic progression of k terms. The induction is carried out 
with respect to k and l and so gives a very poor estimation for 
f(k, l) and in particular for f(k) = f(k, 2). At least I believe that 
the estimation is very bad, though no one succeeded in obtaining 
any better one. Rado and I [38] obtained the first nontrivial 
lower bound for f(k), by proving that f(k) > ((k - 1)2*)^. The 
proof is based on the following simple consideration: The total 
number of ways of splitting n integers into two classes is clearly 2 n , 
and the number of splittings such that one of the two sets contains 
a given arithmetic progression of k terms is easily seen to be 2 n ”* +1 , 
and since there are fewer than n 2 arithmetic progressions all of 
whose terms are <nwe obtain that the total number of ways of 
splitting the integers 1 < t < n so that one of the sets will contain an 
arithmetic progression of k terms is at most n 2 2 n ~ k + 1 . This is less 
than 2 n if n < 2 (Aj “ 1) ^, whence f(k) > and a more careful 

estimation of the number of arithmetic progressions gives f(k) > 
[(& — 1)2*]^. 

W. Schmidt [38] obtained by a difficult and ingenious improve¬ 
ment of our method 

(52) 

and this is the best known lower bound for f(k ) up to the present 
time. 

Using Van der Waerden 7 s theorem, A. Brauer [39] proved that 
to every k there is a p 0 (k) so that if p > p Q (k) then p has k consecu¬ 
tive quadratic residues and also k consecutive quadratic non¬ 
residues (in fact Brauer proved a somewhat more general theorem). 
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Probably the right order of magnitude of po(k) is exp ck. It 
can be deduced from the results of A. Weil on congruences in two 
variables that Po(k) < exp ck. 

In the same way as for the well-known theorem of Ramsey, define 
g(k, l ) as the smallest integer for which if we split the integers 
1 < t < g(k, l) into two classes then either the first class contains 
an arithmetic progression of k terms or the second an arithmetic 
progression of l terms. If k < l then clearly f(k ) < g(k, l) < f(l), 
but I do not know of any nontrivial estimation of g(k } Z); in par¬ 
ticular it would be interesting to have upper and lower estimations of 

g( 3 , 0 . . ■ , 

Let h(n) be an arbitrary number-theoretic function which takes 
on the values +1 and — 1. Van der Waerden’s theorem asserts 
that for every k there is an arithmetic progression for which h{a) — 


h(a + d) = • • • = h[a + (fc — 1 )d]. For a long time I con¬ 
jectured that for every c there exist a d and an m such that 

m 

(53) | y h(kd ) j > c; 

hi i 


more precisely, perhaps there exists a constant c such that for every 
function h(n) and every x there exist d and m with md < x such that 


(54) 


V h(Jcd) 


> Cl log X . 


It is easy to see that’ (54) if true is best possible. (53) requires 
much less than Van der Waerden’s theorem but the arithmetic 
progressions are much more restricted. K. F. Roth recently proved 
(to appear in Acta Afithinetica) that there exist a > 0, d < 
a + md < n for which 

m 

(54') I T h{a -J- kd) | > era*. 

Roth in fact proves a more general theorem. In conversation Roth 
raised the question whether if we drop the condition d < n^, then 
cn M can perhaps be replaced by I showed by probabilistic 

reasoning that (54') is false in general with cn H when c is sufficiently 
large. It is probably false with cn 54 for every c > 0 if n > n 0 (c), 
but I have not been able to show this. 
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Assume now that h(n) = ± 1 is multiplicative, that is, h(a • b) — 
h(a) • h(b >. Then (53) would imply that 

m 

(55) I T h(k ) 

k = i 

is unbounded. (55) seems quite difficult [38]. If h(p a ) = ( —l) a 
we obtain Liouville’s function X(n) and (55) is well known in this 
case, in fact 

n 

y \(k) * o(n*-). 

* = 1 

An interesting and beautiful conjecture on multiplicative func¬ 
tions h(n) = ± 1 states that 

* 

(56) lim - ^ /i(n) 

* = x jLf 

n — 1 

always exists and that the limit of (56) is 0 if and only if 



If (57) does not hold it is easy to see that the limit (56) exists and is 
different from 0. The conjecture (56) does not seem easy; it cer¬ 
tainly contains the prime number theorem, for if h(n) = X(n), (56) 
is well known to be equivalent to the prime number theorem. 

Wintner observed that if we only assume |&(n)| — 1 then (56) 
does not have to hold. R6nyi recently observed that h(n) = 
e % log n p rov id es a simple counterexample. [Added in proof: Wirsing 
just informs me that he proved (56).] 

If h(n) k = 1 for some A;, (56) probably remains true. 

Another variant of Van der Waerden’s theorem is the following: 
Define f(k, c) (0 < c < 1) as the smallest integer such that for 
every h(n) = ± 1 there exists an arithmetic progression 

0 <a<a + d< • • • < a + (k — l)d < f(k, c) 
for which 
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Clearly f(Jc, 1 ) = /(*). Using the same method as that which 
Rado and I used, I showed [38] that for every c > 0 

f(k, C) > (1 + a c )\ 

where a c —*► 0 as c —> 0 and a c —► \^2 - 1 as c -> 1 (perhaps the 
method of Schmidt would allow one to prove that a c —► 1 as c —» 1, 
but this has not been done). I would expect that for every c < 1 

f(k, C) < (1 + a c ')\ 

I am doubtful whether the same inequality holds for f(k) (that is, 
for c = 1). Possibly 

lim [f(k, c)] llk = a(c), 0 < c < 1 . 

The problem of obtaining a good upper bound for f(k) led Tur&n 
and myself [40] to the following question: Let 1 < <Xi < 

< at < n and assume that the sequence {a*} does not contain an 
arithmetic progression of k terms. Put 

maxi = 7 k(n). 

If we could show that, for a certain n, yjt(n) < n/2, we would 
immediately obtain f(k ) ^ ti. Unfortunately this has been shown 
only for k = 3, n = 20. In our paper [40] we only obtain crude 
inequalities for 73 ( 71 ), and Szekeres conjectured that 7 *(ft) — o(n) 
and that 7 k (n) < n 1 ^*. Behrend [40] observed that lim y k (n)/n = 

k * » 

Ck exists; he further showed that either all c k = 0 or lim c* = 1 . 

°® 

Salem and Spencer [40] disproved ?*(«) < » 1- “, ™ fact they showed 
that y 3 (n) > n 1-e/1 °* log n , and Behrend showed [40] that 

(58) 7a(n) > n 1_c ' v ' logn - 

This is still the best known lower bound for y» (n). Rankin [40] 
improved (58) for k > 3. 

Roth [40] proved 73 ( 71 ) = o(n), in fact he showed that 

, . cn 
yz(n) < —- ■* 

log log n 

Unfortunately Roth’s method does not seem to give y±{n) = o(n). 
It would be of great interest if one could prove that y k {n) < *•(«) 
for every k if n > n a (k) since this would imply that for every k there 
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are k primes in an arithmetic progression. Sevedinskij observed 
that 23143 + l ■ 30030 is a prime for 0 < l < 11 . Chowla [40] 
proved that there are infinitely many triplets of primes in arith¬ 
metic progression. His proof does not use y*(n), but runs as 
follows. As is well known Vinogradoff proved that every suf¬ 
ficiently large odd number is the sum of three primes, and Van der 
Corput, Esterman, and Tchudakoff proved using Vinogradoff’s 
method that the number of even numbers not exceeding x which are 
not of the form p + q (p ^ q) is o(x/ (log x) k ) for every k. ' Thus for 
infinitely many primes r, p + q = 2 r is solvable, hence there are 
infinitely many triplets of primes in arithmetic progression. This 
proof no longer works for quadruplets, and I do not see how a proof 
could be obtained except through an estimation of y k (n). If 
yk(n) < v(n) could be proved, then the fact that the primes con¬ 
tain arbitrarily long arithmetic progressions would be deduced 
just from the fact that the primes are numerous and would not use 
any special properties of the primes. This method is sometimes 
successful, for example, I proved [41] in this way that to every k 
there is an n k such that n k = p 2 — q 2 has more than k solutions. 

Denote by f k (n ) the number of solutions of 

^ Pi k = n. 

i = 1 

I proved in [41] that limsup/ 2 (n) = w; the proof used special 
properties of primes, but it could be modified so as to use only 

ir{n) > n /(log ri) k . I can also prove that lim sup / 3 (n) = oo (un- 

»“ «> 

published), and the proof of this seems to need some special prop¬ 
erties of the primes. I can prove nothing for k > 3 . 

Now I have to break my word given in the introduction, since 
I have to mention a conjecture of Hardy and Little wood which is 
of importance in Waring’s problem: Denote by \f, k (n) the number 
of solutions in positive integers of 


h 



The famous X-hypothesis of Hardy and Littlewood asserts that 
^k(n) — o(n *) for every e > 0. 

It is well known that this would imply G(k) < ck } in other words 
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every sufficiently large integer is the sum of at most ck positive 
integral fcth powers (more precisely the if-hypothesis would imply 
G(k ) < 4 k for all k, and G(k) < 2k + 1 if k is not a power of 2 ). 

For k = 2 the if-hypothesis is well known to hold. For k = 3 
Mahler [42] disproved the X-hypothesis; he showed by an identity 
that 

(59) lAsfa 12 ) > cn - 

As far as I know lim ^ 3 ( 71 )/log n has not been determined. The 

n —« 

if-hypothesis is probably wrong for k > 3 too but this has never 
been proved. 

For the applications to Waring’s problem it would suffice to 
show that for every e > 0 : 

X 

(60) T Mn) 2 = o(x 1+ ‘). 

(60) is probably true but this has never been proved. 

Chowla, Pillai, and I [42] proved that for every k and infinitely 
many n 

(61) fain) > exp (c k log n /log log w). 

(61) is of course not enough to disprove the if-hypothesis. 

Let A be a sequence of integers of positive density. Denote 
by ypk(A ; n) the number of solutions of 

£ a i, k = n - 

3 = 1 

I can prove (unpublished) that 

(62) lim sup \f/ k (A]n) = °o. 

More generally if c\ and c 2 are given and n > n 0 (ci, c 2 ), then if 
a x < a 2 < • • • < a h where l > c x n } there always exists an m 
such that the number of solutions of 

k 

m = ^ di* 

3 = 1 

is greater than c 2 . The proof is similar to the proof of (61) but is 
considerably more tricky. 
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Tur£n and I conjectured that if a x < a 2 < • * * , a k < c/fc 2 , 
is an infinite sequence of integers, then the number of solutions of 
n — <n + ay cannot be bounded. I could only prove that the sums 
a * + cij cannot all be different [43]. One would expect that ai < cl k 
implies that the sums of the a's taken A; at a time cannot all be 
different. Unfortunately the proof works only for even k f and 
though the result is undoubtedly true for odd k, I cannot prove it. 
The reason for this difficulty is very simple. For k = 2, if the 
sums + a, are all distinct, then the differences a* — ay are also 
all distinct. It is easy to see that a* < ck 2 implies that the number 
of solutions S(x) of a* — ay < x satisfies lim S(x)/x = oo, and there¬ 
fore the differences ay — ay cannot all be distinct. This argument 
breaks down for k = 3. For further problems and results on this 
subject I have to refer to my paper on unsolved problems [2a] and 
to the interesting review article by Stohr [43]. 

Varnavides [40] proved using Roth's theorem that if 1 < a x < 
< a i < n, where l > an, a > 0 fixed, n sufficiently large, 
then the a's contain more than c a n 2 arithmetic progressions of three 
terms. Except for the value of c a this result is best possible since 
the total number of arithmetic progressions 0<a<a+d<a+ 
2d < n is: 



It would be interesting to determine the best value of c« and to find 
the structure of the extremal sequence. 

Many problems of combinatorial and numbertheoretical nature 
are discussed in my paper on unsolved problems [2a], and here I 
only wish to mention a few of them in which some progress has 
been made since I wrote the paper. 

Denote byaj< • * < a& < a; a sequence of integers for which 

all the sums ^ €»ay, ey = 0 or 1, are distinct, and put 

i = 1 

max k = A(x). 

Many years ago I asked whether 

(64) A{X) “ + °( 1 ) 

holds. (64) seems surprisingly resistant to any attack. 
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I also asked whether A(2 k ) >k + 2 is possible. This was 
answered affirmatively a few years ago by Guy and Coiway 
(independently). Their example is unpublished. Moser and I 
proved (see [2], Collogue . . . Bruxelles, 136—134) that 


A(x) < 


logs 
log 2 


(1 + Qlog log x 
2 log 2 


and recently Moser showed (to appear in the report of the A.M.S. 
Pasadena Conference, 1963) that 


(65) 



4 * — 1 

- > 

3 


with equality only for a, = 2 <_1 . Moser easily deduces from (65) 
that 


A(x) < 


log X 
log 2 


log log x 
+ 2 log 2 


+ 0 ( 1 ). 


This is the best upper bound for A(x) known up to the present. 
It is quite easy to see that if «i < o 2 < • • • is an infinite sequence 
of integers for which all the sums ^ ef®t, e» = 0 or 1, are distinct 

then for infinitely many i, a,- < 2 i_1 . Another somewhat related 
result states that if A(x) denotes the number of solutions of 
£ adi < x and if A (x) = x + 0(1) then a,- = 2 i_1 for i > i 0 . (This 

is proved in a paper, which will soon appear in Acta Arithmetica, by 
P. Erdos, B. Gordon, L. A. Rubel, and E. Straus.) 

Lorenz [44] proved the following conjecture of Straus and myself : 
Let ai < a 2 < • • • be an infinite sequence of integers; then there 
always exists a sequence b\ < i> 2 < ' ' * of density 0 such that 
every integer n can be written in the form Oi + bj. In particular 
he proved that if the o’s are the primes then the b’ s can be chosen 
so that B(x) < c(logz) 3 (b(x) = T l). By using probabilistic 

arguments [44] I improved this to B(x) < c(logx) 2 . The prime 
number theorem trivially implies B(x) > (1 + o(l))log ^ and I 
cannot disprove that B(x) = (1 + o(l))logx. In 1956 Hanani 
stated the following conjecture: If <*i < • • • ;&i < ' * * are two 
infinite sequences such that every integer can be written in the form 
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d* + bj then 

( 66 ) lim sup A(x)B(x)/x > 1 

Special cases of ( 66 ) were proved by Narkiewicz [ 44 ]. Recently 
Danzer disproved ( 66 ) (Danzer’s paper has just appeared in J. 
fur reine und angew. Math). It easily follows from the result of 
Narkiewicz that to every e > 0 there is an infinite sequence oo 
such that 

( 67 ) A(xi)B(xi) - Xi > Xi 1 -’. 

The example of Danzer implies the existence of two sequences 
satisfying 

(68 ) * < A(x)B(x) <x + o(x). 

There is a gap between (67) and ( 68 ) which as far as I know has 
not yet been filled. Danzer and I conjectured that ( 68 ), for two 
sequences such that every integer can be expressed as a { + 
would imply that 

A(x)B(x) — x —> 0 ° 9 

but as far as I know this has not yet been proved. 

Lorenz’s result implies that there exists a sequence o x < a 2 • • • 
satisfying A(x) < cz log log z/log * suc h that every integer is of 
the form 2 + a,-. One would expect that this can be improved to 
cx/log x, but this seems to present unexpected difficulties 
Davenport and I [45] proved that if a, < ... fa an infinite 
sequence of positive lower density then there exists an infinite 
subsequence o f a* • • • satisfying a it | a, t+l . I conjectured that 
there are infinitely many triples a it aj, a t of distinct integers of the 
sequence satisfying [a,-, a,] = a t . This would follow from the 
following purely combinatorial theorem: Let A u . . . A r be sub¬ 
sets of a set S of n elements and assume that there are no three 
distinct sets Ai , Aj, Ai for which 

Aj KJ Aj — Ai, 

Put max r = f(ri). Then 

f(n) = o(2"). 

Recently Sarkozy and Szemer4di proved (69) (unpublished); 
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in fact they showed that/(n) < c2 n /log log n. Perhaps 

f(n) < c2 n /y/n> 


in fact perhaps f(n) = (1 + o(l)) 



Thus the above con¬ 


jecture about triples is now proved. A well-known combinatorial 
theorem of Spemer [46] should be mentioned here: If the Afs are 
such that no one of them contains any other, then their number is 



This theorem has many applications in number 


theory and analysis. 

Turdn and I conjectured that if a x < • * * is an infinite sequence 
of integers, and if F{n) denotes the number of solutions of a* + 


a,j < n } then 


F(n) - cn + 0(1) 


is impossible. Fuchs and I [47] proved that for c > 0 

(70) F{n) - cm + o 

is impossible. In the case a* = k 2 Hardy and Landau [47] proved 
that 

(71) F(n) = m + o(n log n 

is impossible. This is the classical problem of the number of lattice 
points in a large circle. It has been conjectured that in (71) the 
error term is o(w^"*"*), but this seems very deep. It is surprising 
that in our much more general case we obtain a lower bound for the 
error term which is nearly as good as (71) and our proof is very 
much simpler. Recently Jurkat proved that the error term in (70) 
cannot be o(n H ) (unpublished). Fuchs and I suspected a long time 
ago that a sequence d\ < • • • can be constructed for which 

(72) F(n) = cn + 0(n H ); 

this would show that Jurkat’s result is best possible, but we have not 
succeeded in constructing a sequence satisfying (72). Bateman, 
Kohlbecker, and Tull [47] generalized (70) by replacing c by a slowly 
oscillating function, but as far as I know the following simple 
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conjecture has not yet been proved: There does not exist a sequence 
a x < a 2 • • • for which the number of solutions of a< + ay + a« < x 
is of the form cx + 0(1). My proof with Fuchs breaks down. 

Heilbronn and I (our paper will appear in Acta Arithmetica 
[added in proof: 9 (1964), 149-159]) proved that if <ii, . . . , a& are 
distinct residues (mod p), and k > 3.6^\/w, then every residue 
class (mod p) can be written in the form 

k 

X c » a »> ci = 0 or 1. 

t»i 

This result probably holds for k > 2\/n. To show this it would be 
sufficient to show that if a X} . . . , a* are k distinct residues (mod p) 
then the number of distinct residues which can be written as the 
sum of at most r distinct a 1 s is at least 

(73) min (p, rk — r 2 + 1). 


Taking the a } s to be the residues — 



see that (73) if true is best possible. (73) is not even known for 
r =» 2. A special case of a well-known theorem of Cauchy-Daven- 
port [48] states that the number of distinct residues which can be 
written as the sum of r a } s (not necessarily distinct) is at least 


min (p, rk — r + 1). 

Heilbronn and I further proved that if k/p H —► oo then the num¬ 
ber of solutions of 


k 

^ cia* = u {mod p), €i = 0 or 1 

»-i 

is (1 + o{ l))2*/p* The condition ft/p^—» oo is best possible, and 
k > cp % does not suffice. 

We further conjectured that if a\ y . . . , a* are distinct residues 
(mod n) and k > cn w , then 

k 

(74) ^ c<a< es 0(mod n), €,* — 0 or 1 

t—i 

is always solvable. Perhaps (74) is solvable for every c > y/2 
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if n > no(c). Flohr and I could only prove that (74) is solvable if 


k > 


7 = 


1 + log 2/log 3 


our proof is unpublished. 

A famous unsolved problem in number theory asks for the 
estimation of the least quadratic nonresidue of p. This problem 
goes back to Gauss who proved that the least quadratic nonresidue 
is < 2 p H + 1 if p 53 l(mod 8); he used this estimation in his first 
proof of the law of quadratic reciprocity [49]. The first result 
which used the modem methods of analytic number theory is due to 
Vinogradov [49], who proved that the least quadratic nonresidue 
7i2 (p) satisfies 

(75) n 2 (p) < cp* Ve (logp) 2 . 

Davenport and I [49] improved the exponent of log p to The 

first significant improvement on (75) was found by Burgess [49] 
who proved 

(76) n 2 (p) <cp HV \\ogpr. 


The ingenious proof of Burgess uses the following deep result of 
A. Weil [50]. Let f(x) be an irreducible polynomial of degree n; 


then 


(77) 


((?)■ 


is the Legendre symbol 


I'fai 

*=o 


< (n - l)y/p- 


Andr6 Weil used the methods of algebraic geometry in the proof of 
(74). For polynomials of degree 3 and 4, (77) was proved by 
Hasse [50] and weaker inequalities that (77) were proved by Daven¬ 
port, Mordell, and others [50]. 

It seems certain that ri 2 (p) < p * and in fact perhaps n 2 (p) < 
c log p. Tur4n observed that n 2 (p) > c f log p. 

Linnik [49] proved that there is a c ( so that there are at most c t 
primes in x < p < x 2 which do not satisfy n 2 (p) < Linnik 
developed his famous large sieve for the purpose of proving this 
result. 

Davenport and I [49] observed the trivial result that there 
exists a constant c > 0 so that every interval of length cp H contains 
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both a residue and a nonresidue (mod p). We were not able to 
prove that this holds for every c > 0, but Burgess [49] proved the 
stronger result that every interval of length p H+e contains both a 
residue and a nonresidue. It seems probable that can be 

replaced by p e or even by c log p. 

I proved [51] that 

(78) 2”* W -< l + »(D)^tp- 

p<x 

Very likely, if n*(p) denotes the least &th power nonresidue of p, 
one has 


(79) V n k (p) = cjfc t— + o (r——\ 

Lj logo: \log:c/ 

P<x 

but my knowledge of algebraic number theory was not sufficient 
to enable me to prove (79). The proof of (78) is surprisingly com¬ 
plicated; I needed to use the prime number theorem for airthmetic 
progressions, Bran's method, and the large sieve of Linnik and 
R4nyi. 

Denote by /(e, p) the smallest integer such that, for every l > 
/(«, V) 


i 



I expect that 

(80) ^/(e, p) - (1 + o(l))c« 

P<x ® 

but I do not see how to attack (80). 

Denote by r(p) the least primitive root of p. Vinogradov 
proved r(p) < Hua, Shapiro, and I [52] improved this to 

(p(p - 1 )) c p H . 

The first significant improvement on Vinogradoff's result is due to 
Burgess [52] who proved 


r(p) < 

Very likely r(p) < c log p. Ankeny [52] deduced from the general- 
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ized Riemann Hypothesis that n 2 (p) < c(log p) 2 , and obtained a 
weaker result for r(p). 

It seems very hard to prove that 

-( 1 +.(!)) 

P<X 

in fact I cannot even show that lira r{p) < Artin conjectured 
that there are infinitely many primes p for which 2 is a primitive 
root, in fact he made a plausible conjecture about their density. 
As far as I know it is not even known whether to every p there is 
a prime q < p which is a primitive root of p . 

In the last 35 years significant new results were obtained in the 
additive theory of prime numbers, also in Waring's problem, but I 
do not wish to speak much of these since several excellent books 
discussed them recently in great detail. I only wish to state the 
closest approaches known to the famous Goldbach conjecture: 
Selberg and Wang [53] proved, using Selberg's improvement of 
Brun's method, that every sufficiently large integer can be written 
in the form a + b where a has at most 2 prime factors and b at most 
3. R6nyi proved [53] that there is an absolute constant c such that 
every integer is the sum of a prime and an integer which has at 
most c prime factors. R£nyi used the large sieve of Linnik and 
R6nyi and new results about the distribution of the roots of L-func- 
tions. I have been informed that recently Barban proved that 
every large integer can be written in the form p + a where v(a ) < 4. 
One of the main new ideas of Barban's proof is a remarkable 
improvement of the Linnik-R6nyi large sieve. 

Another remarkable recent result is due to Linnik [31]. He 
proved by his dispersion method the following conjecture of Hardy 
and Littlewood: The number of solutions of n = p + u 2 + v 2 is 
of the form 


(1 + o(l))c n 


n 


c n ^ 


‘ log n vn " log log n 

where c n is a complicated constant which depends only on n. He 
also obtains asymptotic formuls for sums of the form ( dk(m ) = 

i o 


xi •••**«« m 


£ <4, Ora) <4, Ora + a) 


m<n 
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and 

£ <4,(wi) d k ,{n - m) 

m<n 

for ki — 2 and k 2 arbitrary. For the details and the history of this 
problem I have to refer to Linnik’s book. 

Many of the results mentioned in this chapter can be proved by 
Brun's method, which is perhaps our most powerful elementary 
tool in number theory. Recently Selberg [12] obtained a sig¬ 
nificant improvement of Brun’s method and in a certain sense 
showed that further improvement is impossible beyond a certain 
limit. As far as I know the following question has not yet been 
investigated. Determine or estimate the smallest fi(x) with the 
following property: there exists a set of residue classes a 4 (mod p»), 
for pi < f\(x) y such that every 1 < u < x satisfies at least one of 
the congruences u s= mod pi). Similarly for / 2 (x), defined as 
follows: there is a set of residue classes a*(mod p t ), pi < f 2 (x) so 
that the number of integers u < x which do not satisfy any of the 
congruences u = ai(mod pi) is o(x/logx). 

Here I would like to call attention to another problem on sieve 
methods which as far as I know has not yet been investigated. The 
essential result proved by Viggo Brun was the following. Let 
p < n% e = e(k) and consider k p < k congruences: 

(81) x m a/*> (mod p), 1 < j < k p < k. 

Then the number of integers x < n which do not satisfy any of the 
congruences (81) is between 

(82) citt Il(l- an d c * n n(l- 

(82) was improved by Selberg [12] in two ways; he permitted a 
larger choice of e = e(k), and he brought c\ and c 2 closer together. 

Linnik and R6nyi investigated the other extreme; in their case 
the number of congruences (81) is very large, and roughly speaking 
they prove that if “many” integers are given up to x then, with 
the exception of a few primes, each residue class mod p contains 
“nearly” the same number of integers if we neglect a “few” excep¬ 
tional residue classes. For a precise statement I have to refer to the 
papers of R4nyi and Linnik [53]. 

As far as I know nobody investigated what happens if in (81) the 



Some Recent Advances and Current Problems in Number Theory 286 


number of congruences increases with p but not very quickly, say 
like log p or like log log p. I have not Been able to find a reasonable 
application for the estimation which would correspond to (82) and 
this may be the reason why this question was neglected. 

Now I would like to call attention to another group of problems 
on congruences. A set of congruences: 


(83) a»(mod n»), Wi < n 2 < • • • < ft* 


is called a covering set if every integer satisfies at least one of the 
congruences (83). The simplest covering set is O(mod 2), 0(mod 3), 
l(mod 4), 5(mod 6), 7(mod 12). I asked if for every choice of n\ 
there is a covering set (83). This problem seems very difficult 
and has been solved only for n x < 8 (by Self ridge and others). 
Many similar questions can be asked, for example, it is not known 
if there exists a covering set with all the n* odd. A simple but not 

quite trivial result about covering sets of congruences states that 
* 


- > 1 [54]. 

' ni 


Two congruences are called disjoint if no integer satisfies both of 
them. Stein and I asked: let (83) be a system of pairwise disjoint 
congruences for which < x. Put max k — f(x). We conjec¬ 
tured that f(x) = o(x). We proved that for every e > 0 and 
x > Xo(e) 


f(x) > x exp (—(logs) 5 * 0* 


It is surprising that the proof of f(x) — o(x) presents difficulties 
and this perhaps due to our overlooking an obvious idea [54]. 

Stein conjectured that if (83) is a disjoint system then there 
always exists a 0 < u < 2* for which u ^ a»(mod m), (i = 1, 
2 ,...,&). This conjecture was recently proved by Self ridge 
(unpublished). The system 

2 i ~ 1 (mod 2*), 1 < i < h 

shows that this conjecture is best possible. 

I conjectured that if a,(mod n<), 1 < i < k is any system of con¬ 
gruences such that there is a u for which u a»(mod Ui) (i — 1, 
. . . , k) y then there is such a u for which 0 < u < 2*. I could 
only prove that there is such a u for which 0 < u < f(k) where f(k) 
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depends only on k, but I did no give an explicit estimation for 

mm 

Recently Elliott (in publication in Quart . J. of Math.) proved 
that if ai < a 2 < * • • < au < n and h > cn /(log n), where 
c > 2, then there exists a prime q such that every residue class 
(mod q) is represented among the a;. He uses Selberg's sieve. 

8. I now refer briefly to a body of recent work on Diophantine 
equations and Diophantine inequalities in many variables. We 
consider equations first. 

The treatment of equations of additive type, such as 
/(* i) + * * * /(*») - N 

where f(x) is an integer-valued polynomial, is possible by the 
Hardy-Littlewood method, and presents no essentially new dif¬ 
ficulty. In particular a homogeneous additive equation: 

aixi k + * * * + a n x n k « 0 

always has an infinity of solutions, provided n is greater than a 
suitable function of k, and provided a x , . . . , a n are not all of the 
same sign if k is even. 

The general homogeneous equation 

f(x h . . . , x n ) = 0 

of degree k , offers much more difficulty. The first method of 
reducing such an equation to additive equations (in a smaller num¬ 
ber of variables) was given by Richard Brauer in 1945. This 
method was not directly applicable in the rational number field, 
because it required the solubility of all additive equations of every 
degree k' < k , and this cannot be ensured for even values of k\ 
But Lewis (1957) modified Brauer’s method to obtain a proof that 
when k — 3 the equation is always soluble if n is sufficiently large. 
Birch (1957) generalized Brauer’s method to prove the following 
remarkable theorem: every system of simultaneous equations, of 
odd degrees k x , , k r , is soluble in integers (not all 0) provided 
n is greater than a certain function of k Xt . . . , k r . 

Davenport (1959 and 1963) attacked the problem of a single 
homogeneous cubic equation directly by the Hardy-Littlewood 
method, and proved that the condition n > 16 is sufficient to ensure 
solubility. It is conjectured that n > 10 suffices and it is known 
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that this would be best possible. A treatment on similar general 
lines of homogeneous equations of higher degree, and simultaneous 
systems of such equations, was given by Birch (1962), but here it 
becomes necessary (and it is indeed essential) to impose further 
conditions. 

A connected account of much of this work, with references, is 
available in Davenports notes: Analytic Methods for Diophantine 
Equations and Diophantine Inequalities (Ann Arbor Publishers, 
1963). 

There is a close connection between these problems and the 
problem of the solubility of homogeneous equations in p-adic num¬ 
bers. In several recent papers, Birch and Lewis have established, 
for particular values of k f the conjecture of Artin that for such 
equations the condition n > k 2 is sufficient to ensure solubility. 

As regard Diophantine inequalities, the principal result of a 
general character proved so far is the following: if Q(x \, . . - , x n ) 
is any indefinite quadratic form with real coefficients, then the 
inequality 

|Q(*i, • • • > *£»)| ^ 6 

is soluble for every c > 0 provided n > 21. (For references see 
Davenport’s notes, mentioned above.) The proof is complicated. 
It is conjectured that n > 5 suffices, and it may even be true that 
n > 3 would suffice if one excluded forms which are proportional 
to forms with integral coefficients. A similar result (but with a 
very large lower bound for n) can probably be proved for cubic 
forms, but further extension seems to present great difficulties. 
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On Stochastic Processes 

Michel Lokve 


I. TRADITIONAL SETUP 

1. Introduction. A characteristic proposition of probability 
theory, and one of the most beautiful, may be stated in intuitive 
terms as follows: a gambler plays related or unrelated games of any 
kind. Except for a miracle, his chances of ruin evaluated in terms 
of past outcomes approach 100 per cent or 0 per cent according as 
he will or will not be ruined eventually. 

In probability terms, this P. L6vy zero-one law (the first martin¬ 
gales convergence theorem) is as follows: if A is a property of (that 
is, event defined in terms of) a sequence Xi, X 2 , ... of random 
variables, then the conditional probability P(A\X h . . . , X n ) of A 
given the first n terms of the sequence converges almost surely to 
the indicator of A(I a = 1 on A and 0 on its complement A c ). 
Unless otherwise stated, convergence will be for n —» 00 . 

In measure-theoretic terms, this proposition becomes: let X \, 
X 2 , ... be a sequence of measurable functions on a measure space 
(0, a, P) with PQ = 1. Let a n and a * be the smallest <r-fields 
for which the finite sequence X u . . . , X n and the whole sequence 
X u A 2 , . . . are respectively measurable. If A £ and P(A | &») 
U5 
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is a Radon-Nikodym derivative of the restriction of P(A A •) to a n 
with respect to the restriction P an of P to a n , then P(A \ d n ) con¬ 
verges P-almost everywhere to I a- 

The above situation occurs regularly in probability theory: the 
phenomenological content of a concept or a proposition is intuitively 
clear and may even sound trivial. The mathematical formulation 
is quite abstract. The proof may require intricate manipulation of 
deep mathematical ideas and results, seemingly unmotivated by 
the intuitive content. However, probability theory developed and 
continues to develop its own “probabilistic intuition,” reflected in 
its terminology and, especially, in its problems. 

The language of probability theory frequently sounds unmathe- 
matical and is easily translatable in terms belonging to other 
branches of mathematics. The origin of its concepts and problems 
(ancient and new) as well as its immediate applications are fre¬ 
quently extremely concrete. Perhaps because of this and despite 
its growing importance for other branches of mathematics, most 
mathematics students receive no foundation in probability theory 
and many mathematicians are not familiar with its methods and 
main results and, sometimes, not even with its terminology. 

Since this talk is addressed to a general mathematical audience, 
I shall begin with a short probabilistic vocabulary, as formalized 
by Kolmogorov (1933), and with the main stochastic structures and 
three results. This will provide us with a basis for discussion of 
fields and directions of research, some active, some dormant, and 
some possible. Probability theory is in full flood: the torrential 
flow creates new problems, requires new methods, uncovers new 
phenomena, and deposits new results—even in the most classical 
cases. Within the limited time at my disposal, it is useless even to 
try to give a well-rounded picture. What follows will be centered 
about a few ideas selected with obvious personal bias. 

2. Basic Vocabulary and Notation . A probability space (0, Ct, P) 
is a normalized measure space: a is a Boolean cr-algebra or <r-field of 
events and the probability P is a measure on & with PO — 1. A rela¬ 
tion holds almost surely (a.s.) if it holds outside a P-null event. 

A random variable A is a measurable function on the measurable 
space (0, a) to the Borel line (P, g) where R is the real line and g is 
the cr-field of its Borel sets. X induces the <r-field (B(X) = X -1 (g) 
of events defined on it. Its (probability) law or distribution *C(X) 
°r Fx is defined by F X (S) = PZ” X (/S), S G g, and (P, g, F x ) is its 
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state or sample probability space. The expectation EX of X is the 
integral XdP —when this integral exists. Then its conditional 

expectation E®X, or E(X |(B), given a sub <r-field © of events, exists 
and is a ©-measurable function defined up to a P-equivalence by 
J B E®X dP = J B X dP for every EG©. When © is the smallest 

sub <r-field of events for which a function Y (numerical or not) on 
T is measurable, it is also denoted by E(X\Y). When X — I Ay 
it is the conditional probability P®A, or P(A|©) (or P(A\Y)) of 
the event A given © 'given Y). 

A random function or ( stochastic ) process Xt = ( X t , t G T) is 
a family of random variables indexed by t G T\ however, later 
we shall use the term “process” in a more general sense and, mean¬ 
while, we shall speak of random functions. The sample space of 
Xt is ( R t , $ T ). Its law or distribution £(Xt) or Fx T is defined by 
F Xt = PXr^iS), S G § r ; according to the (Daniell-Kolmogorov) 
consistency theorem it is determined by the laws £(Xt lf . • . , Xt n ) 
of all its finite sections. Its sample probability space is (P r , $ T , Fx r )- 
Its expectation EXt is the family (EX t , t G T) —when all EX t exist. 

3. Stochastic Properties. Stochastic or random analysis is con¬ 
cerned with those properties of random functions X T which can 
be expressed, directly or indirectly, in terms of their laws. Its 
main concern is with the behavior of sample functions X>(w) = 
(X f ( w), t G T) f « G G. In general, T is a time set—a, subset of 
the real line with the usual topological, measurability, and algebraic 
structures. To simplify, we let T be {1, 2, . . .} or [0, «>), unless 
otherwise stated. In sample analysis there appear sampling 
times” of various properties of sample functions. They are times 
r(w) at which the sample functions X T {u>) acquire these properties, 
for example, the first discontinuity tune, the last exit time from a 
set S , and so on. If there is no such r(co) G? 7 , we set r(w) = °°. 
When r = (r(w), co G ft) is a measurable function on.Q to T U {} 
it is a random time. Then it is an XT-time if the event [r ^ £] is 
defined on (X s , s ^ 0 and X T = (X r(w) (co), w G ^) is a random 
variable; it is sometimes convenient to introduce an ideal state d 
and set X T(w) (co) = d whenever r(co) = <*>. 

4. Independence and Dependence. Probability theory comes into 
its own when random functions are given a “stochastic structure, 
that is, when their laws are given certain properties. 
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At the beginning was independence: random variables X tj t £ T, 
are independent if £(X§ lf . . . , X t f) = II £(X th ) for every finite 
subset (ti, . . . , t n ) of T. The basic problem, and, until recently, 
the only one, was concerned with the behavior of sums of inde¬ 
pendent random variables. The laws of their sums are then con¬ 
volutions or compositions of the laws of the summands. Equiva¬ 
lently, their characteristic functions , that is, the Fourier-Stieltjes 
transforms P of their laws F (with same affixes if any) are products 
of characteristic functions of the summands. The main lines of 
investigation are still the two following ones: 

(a) Convergence of laws F n -> F in the sense of J R g dF n —> J R gdF 

for all g £ C, where C is the space of bounded continuous functions 
g on i2. The main tool is the P. L6vy continuity theorem: F n 
converges to some F if and only if lim P n exists and is continuous at 
the origin, and then lim P n = P. 

(b) Convergence to a random variable of series and of arithmetic 
means of random variables in the sense of almost sure convergence 

(—>), convergence in probability (—>), or convergence in the rth 
mean (—>). 

The leit-motif of independence runs throughout the theory. 
Other stochastic structures are born from, and/or their investigation 
is guided by, the results obtained in the independence case. So far 
only a few sufficiently general and relatively simple structures are 
isolated. 

Markov dependence is born from an attempt by Markov [57] to 
preserve the law of large numbers and the normal convergence, 
obtained in the Bernoulli case (coin tossing): ( X t , t E T) is 
Markovian if P(X,„ +I G S | X*, . . . , XJ = P(X ( „ +1 G S | XJ 
a.s. for every S G S and every finite subset of points ti < • • • 
<t n oiT. 

Martingales are bom from an attempt by P. L6vy [47] to preserve 
properties of series, obtained in the case of independent summands: 
(Xt, t G T) when all EX t exist, is called a (sub) martingale if 

£)X t . = E(X u+l \X tu . . . , XJ 

a.s. for every finite subset of points < • • • < t n of T; sums 
X n = Y i + • • • + Y n of integrable random variables Y k with 
E(Y k +i I Y i, . . . , Y k ) = 0 a.s. (martingale differences) represent 
integrable martingales in the P. L6vy form. 
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Stationarity and second-order stationarity were transposed from 
the ergodic theory by Khintchine [37]: the random function (X*, 
t £ T) is stationary if £(Xt l7 . . • , XtJ “ > ^V*+a) 

(for all finite subsets (<i, ...,«») and («i + A, . • • , + A) o f 
T. The random function (Xt, t £ 2 1 ) with all £|X<| 2 finite is 
second-order r.f. and it is second-order stationary if the covariance 
function r(s, t) = EX(s)X(t), s,tET, is stationary, that is, a func- 
tion of s — t only. 

5. Three Main Results. Three propositions play a fundamental 
role in probability theory and in its applications, simultaneously 
as powerful tools and as sources for problems and directions of 
research. 

The first proposition is the far-reaching extension by Doob 
[14, 15] of P. Levy’s zero-one law: 

Martingales Convergence Theorem. Let {X n } be a (sub) martingale. 
If sup EX n + < 00 then X n ^ X. If the s \X n \ T are uniformly 
integrable for some r ^ 1 then, moreover, X n * X. 

The second is essentially Birkhoff’s [43] and von Neumann s 
[85] ergodic theorems, or the strong laws of large numbers, or 

Stationarity Theorem. Let {Z n } be stationary. If EX i exists then 
n ~\Xi + • ‘ ‘ + X n )™ E'Xi, where 3 is the invariant o-field 
of events defined on [X„] and invariant under translations of this 
sequence into {X n+m }, m - 1, 2,- If |Xi| r ismtegrable far 

some r ^ 1 then, moreover, n~ 1 {X\ + • • • + X n ) —* E X\. 

If 3 degenerates, that is, consists only of events or probability 0 or 1, 
[X„] is said to be ergodic and then E‘X i degenerates into EXi. 

The third proposition is relative to the “central limit problem ; 
that is, the problem of convergence of laws of sums 2* X n k, fc = 1, 
; » oo, of independent summands X n k (with laws P»i) 

which are “uniformly asymptotically negligible (u.a.n.),” that is, with 
sup*P[|X„*| > t] -» 0 for every « > 0. The solution of the prob¬ 
lem yields the class of infinitely decomposable ( i.d .) laws £(X)—such 
that for every n there exist n independent and identically distributed 
random variables Y n i, . . . , Ynn with £(2k^i Y n k) — £(X). 

Infinite Decomposability Theorem: 

khintchine [38]. The class of limit laws of sums 2* X nk of u.a.n. 
independent summands coincides with the class of i.d. laws. 
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p. l£vy [46], I.d . laws F are characterized by a triple (a, £ 2 , L) 
such that 

Hu) = exp {raw ~ 0 2 J + j_l (e iux - 1 - dL(x)j 

where a and 0 are real numbers, and where the P. Livy function L is 
defined and nondecreasing on (— <=o, 0) and on ((),+«) with 

L(+ oo) = o and j -^ x 2 dL(x) < oo. (The symbol -j- * stands for 
integration on (a, 0) U (0, b).) 

gnedenko [27]. <C(2* X nk ) —> £(X) necessarily i.d. (a, /3 2 , L) if 
and only if, 

(i) 2*F„*(/) -> L(I) for every interval I in (- oo , 0) and in (0, + oo) 
whose endpoints are continuity points of L. 

(n) lim lim 2 k <r 2 X nk ' = /3 2 where X nk ‘ = |X ni; | or 0 according as 
\X nk \ < e or |X n4 | ^ « and <r 2 X nk ‘ = E{X nk • - EX nk «) 2 . 

6. First Queries. Research, within the traditional setup, is active 
in the convergence problem of (sub) martingales and stationary 
sequences, and it is extremely active in and about the central limit 
problem. Related queries will be found in Part II. 
queries 1. There are better sufficient conditions than 

sup EX n + < oo 

for convergence of (sub) martingales [ X n }. Using their definition, 

but with conditioning by increasing (r-fields «„ D «(Xi, . . . ,X n ), 

Snell s result [81] is that lim X n (u>) finite exists for almost all 
o> G [mf* sup n E(X n +\a k ) < oo] (the fact that limX n (<o) > - oo 
was observed by Luther). Chow’s result [8] is that lim X n exists 

a.s. when f l <K] X T + < oo for every (X n )-time r. An immediate 

question arises: find a Snell form of Chow’s result. (Since this 
talk was given this query was answered by Chazan.) 

Another type of result which is better expressed in terms of 
martingale differences Y n = X n+1 - X n is that lim X n (<o) exists and 
is finite for almost all o> £ [X(sup Y n+1 \a n ) < oo]. It and others 
are particular cases of another of Chow’s conditions but whose 
formulation is involved. The two types of results overlap but do 
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not contain one another. Question: Is there a sufficiently simple 
condition containing both types? The weaker the condition the 
more hope to answer a basic question: Find necessary and sufficient 
conditions for a.s. convergence of (sub) martingales; in particular, 
for a.s. convergence to a finite limit which would yield the three 
series criterion for series of independent summands. 

In the case of series of independent summands, convergence in law 
implies convergence in probability which implies a.s. convergence 
to a finite limit (the reverse implications are always true). Ques¬ 
tion: Under what conditions—to be automatically fulfilled in the 
above case—do these implications hold for (sub) martingales? 

queries 2. Riesz’s method of proof for the stationarity theorem 
extends to nonstationary sequences and some results were obtained 
[54] without assuming a specific type of dependence. Question: 
Apply this method, or a related one, but to specific nonstationary 
types. 

In the stationarity theorem, if the X n are independent (and 
identically distributed) their sequence is ergodic and E'Xi degen¬ 
erates into EX i. In fact, a converse result then exists: if n \Xi + 

. . . X n ) converges a.s.—necessarily to a constant c , then c 
finite implies that EX l exists and equals c (Kolmogorov) while 
c a=s + oo (— oo) implies that EXi + (EX \~) = + 00 [55]- Question: 
Find necessary and sufficient conditions for a.s. convergence 
necessarily to °° or — oo, when EX\ does not exist. A partial 
result is known [12]. 

Little is known about the “speed” of convergence and especially 
of a.s. convergence, in the (sub) martingale and stationary cases. 
The problem is important and ought to be systematically investi¬ 
gated. A guideline may be provided by whatever is known about 
it in the independence case. Perhaps also the methods in [55, pp. 
514-519] may help. 

QUERIES 3. The existing proofs of Gnedenko’s theorem are 
cumbersome. At least for didactic purposes, a less computational 
and technical proof would be of value. Perhaps it may help to 
note that condition (i) is equivalent to <£(minfc X n k) —► 
and £(max* X nk ) -* £(Z), with Fy(x) = 1 - e~ LM or 1 and 
F z (x) = 0 or e L( *\ according as x < 0 or x > 0. Question: lor 
what “intuitive” reasons do the two extrema of the summands 
determine the P. L6vy function L? 

The central limit problem extends at once to a “central asymp- 
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totic problem.” It is concerned with asymptotic equivalence of 
distribution F n of sums 2* X„k and of i.d. distributions G n , in the 
sense of 


J R 9 dF n — J R g dG n —» 0 for every g 

Sometimes this extension is unavoidable. For example, Doblin 
[13] showed that even in the case of independent and identically 
distributed random variables there may be no norming constants 
a n and b n such that the laws of normed sums 

Xi + • • • + X n 

, d n 

On 

converge, except to a degenerate law (this is always possible). 
Yet the problem of asymptotic behavior remains. The idea of 
comparing laws of sums with laws of the desired limit type goes 
back to Lindeberg’s method for normal convergence (extended in 
[51] to dependent summands and without finite moments). In 
fact, a scrutiny of the methods used in the central limit problem 
shows that the comparison method is already there. It has been 
isolated and, in part, systematically investigated [53]. Question: 
Transpose whatever is known about the central limit problem under 
various restrictions (normed sums, normed sums of identically 
distributed summands) to the asymptotic one. 

The asymptotic problem comes in naturally when the summands 
are dependent, and then “weighted” comparison laws G n appear. 
A conditional distribution F e of a certain type, say normal or i.d., 
yields a weighted normal or i.d. distribution EF e . This problem 
was investigated [55] without imposing a priori a specific type of 
dependence. Question: Investigate the central asymptotic problem 
for various specific types of dependence. Some partial results 
are known for the martingale case. For example, if the summands 
X„k are martingale differences, then, under Liapounov’s condition 
2*2?|Y n j| +s —» 0(5 > 0), the laws <£(2 kX n k) are asymptotically 
equivalent to weighted normal ones [55]; similarly under Lindeberg’s 
condition. Question: Investigate the asymptotic behavior of laws 
of martingale sequences {X„} and, in particular, of stationary ones. 
A partial result is known [3, 34] in the stationary ergodic case for 

square integrable martingales: £(X n /\/n) converges to a normal 
law. 
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queries 4. The central limit problem can be immediately 
reformulated for random numbers k n of summands. Besides a few 
generalities [53] and a result of Anscombe [1], the investigation is so 
far concerned only with the case of limit laws of normed sums of 
independent and identically distributed summands. The first 
results were obtained by Robbins [76] followed by Tate [83] in the 
case of random numbers k n independent of the summands 7*. 
Renyi [74] showed that, without this assumption, normal converg- 
ence of £(2 £_i Y k /Vl) implies that of £(2 Li Y k /VT n ) provided 
k n /n —* £—a discrete random variable. Then the “discrete” 
restriction was eliminated [5]. All these results are but special 
cases of much more general ones of Wittenberg [87] who uses the 
comparison method by the (Kolmogorov-Smimov) vertical dis¬ 
tance P {F n ,G n ) = su Px \F n (x) -G n (x)\. It appears that this 
problem has now a satisfactory answer, but only in the case of laws 
of independent and identically distributed summands. It seems 
that the ground is prepared for an attack on the limit laws problem 
for normed sums, then for sequences of sums 2* X nk . Practically 
nothing is known about other types of convergence. In fact, all 
the preceding queries in this section can be reformulated in terms 
of random numbers of summands. There appears then a number of 
problems and not always very difficult ones. 

queries 5. Convergence (and comparison) of laws in terms of 
the vertical distance is but one of possible types. A family of types 
may be defined as follows. If B is a subspace of bounded meas¬ 
urable functions, then F n and G n are asymptotically B-equwalent 
(F n -G n -^0) if / B gdF n - f R gdG n -► 0 for every g E B. The 

usual convergence of laws obtains for B = C. Yet in Markov 
processes appear naturally various subspaces of C such as the sub¬ 
spaces of uniformly continuous functions, of those which vanish at 
infinity, and of those which vanish off compacts [55, pp. 622-630]. 
It seems potentially interesting to examine the whole central limit 
problem and its variants in terms of these types of convergence. 

In fact, various types of convergence of laws and, especially, 
the vertical distance type are of value for the “central approximation 
problem”: Find in terms of n how “close” is <£(2* X n k ) to its limit 
law or to a class of i.d. laws. This is an important problem, for 
limit theorems are not quite satisfactory when the “speed” of con¬ 
vergence is unknown. There is a very large number of partial 
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results, mostly in the case of independent and identically distributed 
summands. One direction of research began with Liapounov [49], 
Berry [2], Esseen [20], and others, and continues with Cramer [11], 
Petrov [68-69j, and many others. There, the apparatus of char¬ 
acteristic functions is constantly used. Another direction, born 
recently, begins and continues with Prokhorov [70-71], Kol¬ 
mogorov [41], Meshalkin [59], Rogozin, and Le Cam [45]. It is 
concerned primarily with vertical and P. L6vy distances of dis¬ 
tributions F n of sums of independent random variables Y n , with 
common distribution F, from i.d. distributions. It uses direct 
probabilistic methods and exploits P. Levy’s concentration function. 
Some beautiful results are already known: 

(1) If s n (G) = sup F p(F n , G), there exist constants c, c' and i.d. 
distributions G n , G n ' such that s n {G n ) = cn ~^ (Kolmogorov) and 
s n (G n ') c'n ~ 54 (log n)~ i (Meshalkin). 

(2) If P[X n ^ 0] = p n and p = sup p„, then sup„ p(F n , G n ) ^ 
25p H , for i.d. distributions G n given by G n = exp [2£ =1 (F k - 1)} 
(Le Cam). There is an overlap between Kolmogorov’s and Le 
Cam’s results but the connection is not clear and ought to be investi¬ 
gated. Also, it may be useful to introduce approximations by 
weighted i.d. laws. 

queries 6. A number of unsolved problems about characteristic 
functions arises from probabilistic considerations. For example, 
it would be useful to have a representation of the class of char¬ 
acteristic functions with a given modulus. One may begin with 
specific moduli. Note that if the X n are independent random 
variables, then the laws of sums of random variables ±X n have 
the same modulus. Another useful result would be an answer to 
the problem of representation of classes of characteristic functions 
with the same values in some neighborhood of the origin. In 
particular, when is the class a singleton? This problem is related 
to that of decomposition of laws and to the continuity theorem. A 
partial answer to the particular problem is known [42]. Important 
recent advances in the decomposability problem, due to Linnik, 
raise still other questions. Since all these problems, once stated, 
are essentially analytic, I shall be content here with referring to 
Linnik’s book [50], to P. Levy [48], and to Lukacs [56]. 

Characteristic functions are, up to constant factors, those con¬ 
tinuous functions of positive-definite type T(s, £) which depend only 
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upon the difference s — t of the arguments (Bochner). In turn, 
the class of all functions of positive-definite type coincides with that 
of covariance functions. It would be of value to second-order 
analysis to transpose known properties and unsolved problems for 
characteristic functions to covariance functions. For example, find 
conditions under which a continuous covariance function T(s, t) is 
“harmonizable,” that is, of the form 

r(s, t) = J+* /_*“ dd'y(x, y ), 

where y is a covariance. These conditions ought to be auto¬ 
matically satisfied when r(s, t) is a function of s — t only, so as to 
have an extension of the important Bochner theorem. 

Functions of positive-definite type appear in various branches 
of mathematics (integral equations, reproducing kernels and func¬ 
tions of several complex variables, and so on). Their interpretation 
as covariances may yield new results or simplify existing proofs. 
For example, Schur’s theorem on preservation of the positive- 
definite type under multiplications is immediate when independent 
random functions are used. Also there are mutual interactions 
between second-order random functions and reproducing kernels 
theories [55, p. 490] which ought to be investigated systematically. 
Only some very partial results are known [66]. 

II. DISCUSSION 

The traditional setup is fast breaking down. The foundations 
remain but various traditional restrictions are being gradually 
removed. New mathematical problems and phenomena are being 
discovered. In fact, some applications of probability theory to 
mathematics and to physics are shaking its very foundations. 

1 Stochastic Structures. Probability theory is concerned with 
stochastic structures. Yet, to quote Doob [16], "progress has cer¬ 
tainly been slow in the development of new types.” Our first task 
is to analyze the known types and the accretion of substructures. 

Let (Q, a, P) be our probability space and let the <r-fields under 
consideration be contained in a. To avoid technicalities, assume 
that every sub <r-field, say, « contains the <r-ideal 91 of P-null events. 
This can always be achieved; it suffices to replace ® by the <r-field 
generated by <B and 91. 
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Let (<B*, t E T) be a family of ^--fields. In phenomenological 
terms, <B* is the class of “observable outcomes” at time t . To 
every t E T f we associate the past tr-field a t = V r s« (B r , the /wtoe 
c-field <2* = V a ^(B, and the tail a-field of the future G = <2* 

(also the tail <r-field <2 f of the past when T has no first ele¬ 

ment). If the (B t are nondecreasing in t so that a e = (B 6 we may 
and shall then write <2* in lieu of (B*. We associate random times 
r, say, to the <r-fields of the past: r is an (a t )-time if [r ^ t] £ a t) 
t E T) all t E T are trivially such times. 

A stochastic process (X t , <B*, t E T) is a family of a -fields (B* 
with associated <B*-measurable functions X t . When (B* ^ (B(Jf*), 
that is, the X* induce the (B*, we need not and do not mention these 
<r-fields and thus we recover the traditional processes ( X t , t E T). 

Total structures: stationarity . Families of <r-fields may be defined 
in terms of one c-field (B and an additive group or semigroup ( 6 t , 
t E T) of translations , that is, mappings of events into events which 
preserve complementations and countable unions hence intersec¬ 
tions. Thus to (B are associated its translates—the <r-fields (B* = 
0*(B; we may and do restrict « to V teT 0 t C B. The events which 
remain invariant under translations form the (6^-invariant a-field 6 ; 
note that it is contained in the tail c-field G. When 6 degenerates, 
the 0 t ( B form an ergodic family. 

Translates of sets yield translates of their indicators by Otis = 
Ie t B and, by extension, translates of measurable functions X into 
measurable functions 6 t by [e t X E S] » 0 t [X E S], s E §. The 
family (0*<B, t E T) is stationary if the probability P is (0 t )-invariant, 
that is, P0* = P. Then the stochastic process (9 t X, 6 t C B, t E T) is 
a stationary process. 

Conditional independence . In the background of independence 
and of Markov dependence lies the concept of conditional inde¬ 
pendence of <r-fields (isolated in [55], because the random variables 
which figure in the traditional description of these types intervene 
only through their induced <r-fields. 

Two classes Cx and G 2 of events are conditionally independent given 
a <r-field e if, P (S C 1 C 2 = P e Cx • P e C 2 a.s., C x E G h C 2 E C 2 ; if G x 
and e 2 are <r-fields then, equivalently, P e i Ue <7 2 = P e C 2y C 2 E G 2 
(or - P*C l} Cx E Cx), a.s. (since any pair of a-fields is 

conditionally independent given (2, we may exclude this trivial 
case). In a <r-field G x there is a largest such G, namely G x itself 
(whatever be e 2 ). There is also a smallest one, namely the smallest 



On Stochastic Processes 257 


ff-field for which all P e 'C 2 are measurable—the a-field of conditional 
independence of 6 2 upon 6i; similarly for such <r-fields in a v-field 
& 2 - Note that two families of c-fields are conditionally independent 
given 6 if and only if it is so for the smallest v-fields generated by 
each family. 

This concept leads naturally to the following stochastic structure: 
the cr-fields ® t , t £ T, are ( 6 t )-independent if a t and a* are con¬ 
ditionally independent given the c-field 61 for every t £ T. If 
6^ = ( 5 $^ we have Markov dependent cr-fields ®<) note that for any 
family (® ( , t £ T) its cr-fields of conditional independence (of a* 
upon Ct,) are Markov dependent. If 6, = 6, the ® t are 6-independ¬ 
ent. In particular, if 6 degenerates, then the ®« are independent. 

A stochastic process ( X t , ®<, t £ T) will be said to have any of 
the above structures if the ®< have it. Also, if the ®c are 6-inde¬ 
pendent and the X t have same conditional law given 6, we may say 
that the process is 6-exchangeable. When ®< = ®(Aj), Markov 
independence and exchangeability structures reduce to the tra¬ 
ditional ones. . . 

Amon g the stochastic structures based on conditional inde¬ 
pendence, 6-independence is, in many ways, the most direct exten¬ 
sion of independence. Problems are immediate and not necessarily 
difficult, while the answers may be quite useful. For, transposition 
of results and problems in the independence case is obvious, apply 
the results to 6-conditioned distributions F e , then take expecta¬ 
tions and weighted distributions EF e appear. The first systematic 
work in this direction is that of Buhlmann [6]. He is concerned with 
exchangeable random variables X n (centered at expectations) with 
finite variances. They are reduced to 6-independent and identically 
6-conditionally distributed random variables, and weighted normal 
convergence follows. As a particular case, the following elegant 
necessary and sufficient condition for normal convergence^of (Ai + 

• • • + A„)/V / n is obtained: n~ H-Xi + ' * * and 

ri-i(Xi 2 + • ' ' + A„ 2 ) —* 1. He also introduced and discussed 
random functions with “exchangeable increments.” In a different 
direction, Freedman [22o] obtained important extensions of de 
Finetti theorem. Question: What about 6-exchangeable random 
variables without finite moments? Weighted stable laws will then 
appear. What about the whole central limit problem and its 
variants? What about other types of convergence? 
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Asymptotic structures: asymptotic independence. The main prob¬ 
lem of random analysis is to describe the behavior of processes with 
an imposed structure as time t varies. It is frequently reduced to 
the problem of the asymptotic behavior of stochastic sequences 
(as t— ► »), and then we shall assume, to simplify, that T = { 1, 
2, . . .} or T = {. . . , —l, 0, 1, . . .}, according to the context. 

Asymptotic behavior ought to be determined by asymptotic 
stochastic properties. This leads to associating to stochastic 
structures their asymptotic types. The question is how to select 
those asymptotic properties of a structure so as to preserve the 
desired asymptotic results. The more of them we impose, the 
closer we may come to similar asymptotic behaviors. This field 
of research is at its beginning. It is mostly concerned with impos¬ 
ing upon a stationary process various versions of “asymptotic inde¬ 
pendence” so as to preserve normal convergence and sometimes a 
law of large numbers. 

The weakest interpretation of asymptotic independence in the 
stationary case is ergodicity: the invariant v-field degenerates. 
Then in the stationarity theorem the almost sure limit degenerates 
into EX i, as in the case of independence. A slightly less weak 
interpretation is that the tail cr-field degenerates. This is equivalent 
to a “mixing” property (Mo): sup X€a < \PAB - PAPB\ -> 0 for 
every ® £ ft. But to achieve the specific goals mentioned above, 
stronger mixing properties had to be introduced: There exists a 
function c(t) decreasing to 0 from some t on, such that for everv 
A E a„ B e a‘ +t 

(Mj) |PAB - PAPB\ £ c(t) or 

(Mi) |PAB - PAPB\ ^ c{t)PA or 

(Ms) |PAB - PAPB\ g c(t)PAPB. 

The first was used by Rosenblatt [77] and the second by Rozanov 
[79] and Ibragimov [35], primarily to achieve normal convergence. 
The third one was used in [5] to obtain laws of large numbers. Some 
properties of the independent and identically distributed case were 
transposed. For example, Ibragimov found, under {M{), necessary 
and sufficient conditions for normal convergence when the second 
moment is finite and showed that, under (M 2 ), the limit laws of 
normed sums are stable. 
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A different type of mixing was introduced by Renyi [73] and used 
to transpose limit laws theorems for normed sums of independent 
summands to various cases of dependence. A sequence {A n } of 
events is mixing if PA m A n pPA m (0 < p < 1) for every m, 
equivalently, PA n A —► pPA for every A £ (t. It implies that 
QA n —> p for every P-continuous probability Q. Note that if 
there is a limit law for normed sums of independent summands X nf 
then for every x the sequences of events \X n < x] have this mixing 
property. Sucheston [82] investigated such types of mixing and 
their direct consequences. 

query. Investigate systematically the influence of the various 
types of asymptotic independence for martingale and Markov 
structures, stationary then nonstationary. Consider the various 
types of convergence. 

Asymptotic conditional independence . The importance of ©-inde¬ 
pendence becomes apparent when it is given an asymptotic form. 
The first systematic work in this direction is Cogburn’s deep 
analysis of stationary sequences on T = (. . . , — 1, 0, +1, . . .) 
([9] also [10]) containing as particular cases many results, new and 
known, under asymptotic independence. Among others, he uses 
the following asymptotic conditional independence given the invari¬ 
ant o'-field 9 : 


n — 1 

- V p a «o k A - p^a 

n l-l 


Jfc=0 


5S c(n)l0 


for all A E ft 0 . A number of results follows for the stationary 
Markov case, the central limit, and the ranking limit problems. 
Weighted i.d. laws become the standard ones, and degenerate into 
i.d. ones in the ergodic case. Question: Examine conditional forms 
for various mixing conditions mentioned above. What becomes of 
the known results under these conditional forms. Proceed then to 
the nonstationary case. 

Asymptotic stationarity. From a phenomenological point of view, 
stationarity describes statistical equilibrium. But this is a “long 
run” property of phenomena—an asymptotic one. Thus, to pro¬ 
vide more realistic models for phenomena, it is natural to consider 
“asymptotic stationarity.” It may be given various forms, say, 
p$ t —> p —where P is necessarily a (0 ( )-invariant probability, or 
| p#, — P0 t | —» 0 as s, t —* oo under various restraints, say, s = git) 
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is a specific function of t and, in particular, g(t)/t~* 0 or g(t)/t —► 
c > 0 a constant to be specified or not. I know of no work in this 
direction. Question: Introduce and discuss forms of asymptotic 
stationarity according to phenomenological or mathematical con¬ 
siderations. What happens to stationarity results, especially to the 
fundamental stationarity theorem? Add ergodicity and various 
asymptotic and asymptotic conditional independence, conditions. 

Martingales. The definition of (sub) martingales associated by 
Doob to <r-fields a« = V ( gt® a may be written (X s ^) X, = 
P(.3r«|(t a ) a.s. They are (sub) martingales in the traditional sense 
but possess more properties when ® t strictly contain ®(X t ); when 
®t = ®(A ( ) so that a t = ®(A 8 , s t), then one recovers the 
traditional (sub) martingales ( X t , t E T). In its turn, this defini¬ 
tion leads at once to an extension of the (sub) martingale structure: 
Let <pt be consistent (nondecreasing) signed measures on non¬ 
decreasing a,, that is, <P, = g <p t ) for s < t. The family 
(<Pt, a t , t G T) may be called a martingale (submartingale). It 
reduces to the preceding type if and only if the <p t are o--finite and 
P a ,-continuous (with Radon-Nikodym derivatives X t ). However, 
in this new form, a simple property leads to new proofs and results: 
<p x = lim^^ <p t exists and is finitely additive on the field VJ, er a<. 
To simplify, let T = (1, 2, . . .). (For nets T important results 
were obtained by Krickeberg, for example, [43J and by Chow [7].) 

The consistency theorem becomes a martingales theorem on a 
specific probability space: For a martingale (P n , <2 n ), where the 
P n are probabilities on the o--fields of cylinders in R K with Borel 
bases in Pi X • • * X R n , P„ = lim P n is c-additive on VJ« n and 
hence determines a probability on V« n - This leads at once to the 
investigation of even traditional (sub) martingales ( X„, n = 1, 
2, • • •) but on their sample spaces (P°°, §“). For example, it 
provides another proof of a.s. convergence of integrable (sub) 
martingales with, moreover, <p = lim <p„ bounded and <r-additive if 
(and obviously only if) sup E\X n \ < » ([ 54 ], also [63] and [55], pp. 
94, 402—407). In fact, in connection with this approach, there 
appear asymptotic behaviors of (to simplify) finite signed measures 
<p n which yield a.s. convergence with no other structure imposed. 
For example, if there exists such a measure on V«„ with | <p n (A) — 
<o(A)| g c(n)P A where c{n) -* 0, for every A £ On (AG «"), 
then their Radon-Nikodym derivatives (of the continuous parts) 
X n converge a.s. to the Radon-Nikodym derivative of the restric- 
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tions to va n (na n ). This yields the a.s. convergence of (sub) 
martingales. Question: Would the same approach or a related one 
yield a.s. convergence for the stationarity theorem? 

We return to (sub) martingales ( <p n , w = 1> 2, . . .)• The 

question of a.s. convergence arises when <p = lim <p n is only finitely 
additive. A satisfactory answer is known when <p is bounded [7]: 
X n X*—the Radon-Nikodym derivative of the largest (r-additive 
part of <p as determined by the Hewitt-Yosida theorem [30]. Ques¬ 
tion: What happens to the Hewitt-Yosida theorem and to this 
proposition when <p is cr-finite, then when the <p n are <r-finite? Ques¬ 
tion: What about a further modification with finitely additive <p n 
and corresponding Radon-Nikodym derivatives h la Bochner- 
Phillips? 

(CB t)-cla,sses of processes . A cycle of problems arises when, 
instead of one process, one considers classes of processes with ran¬ 
dom variables Z$, Yf, . . • associated to a same family ((B t, t G F) 
of o-fields or (®^) -classes. 

query. Investigate and find operations which preserve a given 
structure of a ((B*)-class. For example, the ((B*)-class of (sub) mar¬ 
tingales is closed under formation of linear combinations aX t + 
bY t (a ^ 0, b ^ 0 for submartingales), suprema, transformation 
into g(Xi), h(X t ) by continuous convex functions (and nondecreas¬ 
ing for submartingales), and so on. Are there other such operations ? 
What about various limit operations? The question admits vari¬ 
ants: One may have to change the probability measure to pre¬ 
serve the structure under an operation. For example, let X n ^ 0 
with EX n = 1 form a martingale and the Y n ^ 0 form a 
submartingale, both associated to <r-fields <tn* Then P»A — 
lim J a X n , A G WO*, exists but is not necessarily <r-additive, while 

its restrictions P« to On are probabilities. Yet [17] ( Yk/Xk , 
k ^ n) is a submartingale (martingale when the Y n form a martin¬ 
gale and the Z„ > 0) associated to the Ct* but with respect to P n 
and not P. If P« is tr-additive then (F»/Z„, n = 1, 2, . . .) is a 
submartingale (martingale when the Z» > 0) with respect to P*. 

query. Find subclasses of given ((Bf)-classes which “generate 
the whole class. The term “generate” may be given various mean¬ 
ings (if any), say, by means of certain operations. For example, in 
a different direction, under relatively weak conditions [60], sub¬ 
martingales are “generated” by martingales in the sense that they 
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can be decomposed into martingales and nondecreasing processes 
(which are always submartingales) associated to the same o--fields. 

query. Find within a given ((B*)-class with a specified structure, 
the subclass which has also another type of structure. Some results 
in this direction are known. Let me refer to [70] where the problem 
of characterization of stationary processes (6 n X t n = 0, ±1, 
±2, . . .) which can be obtained from independent and identically 
distributed random variables £» by means of a Borel function g 
and the translations 6 n : X n = g(d n {. . . , £_i, f 0 , £i, . . •}), is 
discussed. The query may be reversed: examine those structures 
which may appear under various specified operations in ((B*)-classes. 

2. Index Spaces. The traditional index space T is a “time” 
set—a subspace of the real line with the usual structures inherited 
from it. At first glance, it permits us to describe the evolution of 
random phenomena. Yet, even from this phenomenological point 
of view, it may not be satisfactory. A space-time evolution may 
be required (quantum field theory). However, the most important 
fact is that the measurements are never instantaneous. They do 
not yield values at a fixed moment of time but a sort of average of 
the values in some neighborhood of this moment, say, = 

jX(a), t)<p't) dt where the functions <p characterize the measuring 
apparatus. 

Phenomenological as well as mathematical considerations lead to 
the following types of nontraditional index spaces T . 

(1) T is a linear and, in general, locally convex topological space. 
Most frequently, T is a subspace of a multidimensional Euclidean 
space R n (or of a Minkowski space). The corresponding “random 
fields Xt” theory is at its beginning. T may be a Hilbert space 
and, more generally, a Banach space; there is a fast-growing litera¬ 
ture on this direction, pioneered by Mourier [62], Fortet [22], and 
others. 

(2) T is a class of sets, especially, T is the Borel field on R or R N . 
The corresponding “random measures” Xt appear already in 
second order harmonic analysis. 

(3) T is a functional space such as spaces of continuous functions 
on subspaces of R or R N . The most important ones are the spaces 
introduced by L. Schwartz and, in particular, the space § of infinitely 
differentiable functions <p with |i|V (fc) (0 —> 0 as |t| —> <» for every 
integer n and every fcth derivative <p (k) , or the subspace £> of such 
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functions vanishing off compacts. The main operations in £> which 
are transposable to random distributions are those which commute 
with shifts 8k —defined by 0*<KO = <p(t + h). Such are linear 
combinations of shifts and their limits, say, “differentiation 

defined by <p'(t) = lim 0(0*0 - <p(t))/h f “filtering” by K defined by 
*->° 

$k(t — s)(p(s) ds , and so on. 

The random functions Xt are selected so as to reflect in some sense 
the structures of T. For linear topological index spaces they are 
linear and continuous. A linear random function is simply a linear 
mapping of a linear space T to the linear space (R of random variables 
on some probability space or to the space of their equivalence classes. 
When T is provided with a topology compatible with its linear 
structure, then various types of continuous linear Xt appear cor¬ 
responding to the usual types of convergence in (R. If T consists of 
differentiable functions <p , it is “natural” to define the derivative of 
X(<p) by X\ip) ~ —X(<p') (this is suggested by formal differentia¬ 
tion of jX(t)<p(t) dt and integration by parts). The most impor¬ 
tant of such continuous linear random functions are random 
distributions. 

A random distribution X is a linear mapping on £> to (R (or <R/S>1) 
continuous, that is, X(<p n ) —► X(<p) in pr or a.s. or in the rth mean 
in the sense of convergence in 3 l) : <p n * <P if Vn vanish off 

some compact and <p n ik) -* <P ik) uniformly for every k = 0, 1, . - - - 
Similarly for “tempered random distributions with 3) replaced 
by § with its type of convergence. 

The mathematical importance of index spaces of distributions 
for probability theory is due to the fact that in the traditional setup 
sample functions are, in general, extremely unwieldly from the 
point of view of classical analysis: they are not differentiable, their 
Fourier transforms do not exist, and so on. Yet, once random 
functions are transformed into random distributions, the whole 
distribution theory becomes available and sample analysis becomes 
possible. 

The corresponding “random distributions” theory was pioneered 
by Gelfand [26] and by Ito [32], and continued by Urbanik [84], 
Dudley [19], and others. . 

Random distributions theory is only at its beginning. So far, 
only independence and stationarity structures were transposed to 
random distributions. X((pi) and X (^ 2 ) are independent whenever 
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<Pi<f 2 = 0. Xs is stationary when £(X(«pi), . . . , X(<p n )) - 
£(X( 6 h<pi), . . . , X( 6 h <p n )) for every shift B h and every finite 
subset (<pi, . . . , <p n ) of 3D, and X& is second-order stationary if 
EX(<p) = EX(dh<p) y EX(<pi)X((p 2 ) = EX( 8 h<pi)X( 6 h<p 2 ) for every <p, 
every pair <pi , <p%, and every 6 h , There is need for a large scale 
systematic investigation: Transpose to random distributions, and 
investigate further, with all the analytical tools thus become avail¬ 
able, the foregoing total and asymptotic structures, the traditional 
results and the foregoing queries. 

There is another modification of index spaces which already 
proved its worth in Markov processes but is not yet used else¬ 
where. The index spaces T(oj) vary with to £ ft. In Markov theory 
they are time intervals with random right-end points. In Martin 
boundary theory, Hunt [31] introduced random left- and right-end 
points: T(w) = [a(co), £(w)] with a and 0 not necessarily finite. 
From a phenomenological point of view, such index spaces are 
natural for construction of models: a random phenomenon may 
be expected to have a random birth a and a random death There 
is need for a systematic investigation of the whole traditional setup, 
its results and problems, but with random index spaces. 

3* Abstract Sample Spaces. The definition of a random variable 
takes into account the usual topology of the real line indirectly— 
through the construction of its a-field of Borel sets. Yet, all it 
requires is a measurable sample space (9C, g). This leads at once 
to “abstract” random variables X, that is, functions X on (SI, d) 
to (9C, g) such that X~ 2 (g) C From this point of view, any 
random function Xt ~ ( X t , t £ T) (whether the X t are numerical 
or abstract) is itself a random variable with sample space (0C r , g r ). 
However, this space has already some structure: it is a space of 
functions on T and sample analysis has content provided there is 
meaning to properties of these functions, such as continuity, 
measurability, and so on. 

Topological and measurability structures of this functional space 
are, in general, the corresponding product structures of those 
imposed on 9C. (Most frequently, measurability in 9C is introduced 
through its topology: g is the Borel field—the one generated by the 
topology, or g is one of the Baire fields—the smallest one for which 
continuous functions belonging to a subfamily of them are measura¬ 
ble.) Also Xy may be of second order and considered as a curve in 
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the Hilbert space L 2 (0, d, P) of equivalence classes of random varia¬ 
bles, or the Xt(o>) may be signed measures, and so on. 

Whichever the case, we are faced with random variables which 
take values in some structured space such as a linear topological 
space or a compact or locally compact topological group. Their 
appearance may be due to some more or less “soft” generalization: 
given a proposition in the traditional setup, one may seek to impose 
structures to upon 9C under which the proposition remains valid, in 
part or in toto. Or they may be due to a more or less “hard” 
generalization: in a mathematical problem with a given 9C there 
appear beings or methods which recall probabilistic ones, and this 
leads to an investigation of the similarities but within the setup of 
the problem. 

In fact, the first abstract random variables appeared in concrete 
problems: as far back as 1928, F. Perrin [6, 7] studied rotations of a 
sphere immersed in a fluid whose particles perform a Brownian 
motion. After n units of time, its total rotation is p n p n -i . . . pi, 
where the p's are independent random rotations. This raises the 
problem of (multiplicative) laws of large numbers for random varia¬ 
bles with values in an orthogonal group (noncommutative). Or¬ 
dered linear lattice chains of particles subject to continguous ran¬ 
dom forces are governed by linear difference equations, say, u n = 
i n u n -1 + VnUn-2 with (£ n , Vn ) independent and identically dis¬ 
tributed random vectors. Then 5=5 X n X n _\ . . . X\ 

where the X n — are ran d° m matrices, and the 

asymptotic behavior of u n is governed by the matrix product 
Z„I n -i . . . X\. Random phenomena governed by differential 
equations lead to random variables in a Lie group, and so on. 

Thus, “hard” generalizations become unavoidable. They have 
to proceed from given structured sample spaces, and the problems 
are mostly guided by or are similar to the traditional ones. The 
need for “abstract” expectations, that is, abstract integrals arises. 
Quite a collection of them and procedures for formation of needed 
ones are available (see [58]). Also there is need for “abstract” 
conditional expectations or Radon-Nikodym type of derivatives. 
Here are available operator treatments begun by Moy [64]. 

At present, the problems under investigation are centered about 
independent and identically distributed random variables in the 
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direction of convergence of laws and the strong law of large numbers. 
They require a high degree of mathematical sophistication. Yet 
the subject is at its very beginning, and even transposition of 
various stochastic structures and traditional results remains to be 
done. Since there is a recent “expose d’ensemble” by Grenander 
[28] of known results with an exhaustive bibliography, I shall be 
content to refer to it, adding only a few more recent papers I think 
of importance, by Furstenberg [23] and Gangoli [24-25] (there are 
many others as well). 

I would like to mention here an unpublished result of Strassen 
which shows how new phenomena appear in a most classical prob¬ 
lem: Let (P, d) be a compact metric space and let X T (n) be inde¬ 
pendent and identically distributed square integrable random 
functions with continuous sample functions, that is, abstract 
random variables with values in the space C of continuous functions 
on T. Center the X/ n) at their expectations: EX t (n) ss 0. It 
may be expected that the usual normal convergence holds, that is, 
£(%k=*\.XT ik) )/y/n) converges to a normal law in C. Yet, let Hu 
be the 5-entropy of T, that is, the log 2 of the smallest cardinal of 
5-dense subsets S of T(d(t, S) < 5 for every t G T ). To simplify, 
assume that T is “not too big” in the sense that 5 2 H S remains 
bounded as 5 i Oand that 0(5) = esssup sup \X^ n) — X^ n) \ < oo. 

O) d(8,t)<& 

Then normal convergence holds if there _exist constants c > 0 and 
a > 1 such that 0(5) ^ c/(log 1/5)“ a/Hj. However, given any 
T as described above, there exist Xp (n) with 0(5) ^ l/(log 1/5) 
for which normal convergence does not hold, in fact, there is no con¬ 
vergence of laws! 

4. Extensions. Sample spaces, selected in advance, are frequently 
too poor for some problems and have to be extended accordingly. 

I shall be content with a few examples. 

In the traditional setup, the definition of random variables as 
taking values in the real line R breaks down at almost every step. 
The topologies for the space (R of random variables on (ft, Ct, P), 
and for its subspaces of random variables X with P|x| r < oo, cor¬ 
responding to various types of convergence based upon P, are not 
separated. To make the limits unique one has and constantly does 
identify equivalent measurable functions. This introduces ran¬ 
dom variables which can take infinite values but with zero pmba- 
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bility. Furthermore, (ft is not closed under a.s. passages to the limit 
and this introduces random variables which can take infinite values 
with positive probability. Also sequences {F n } of laws are not 
compact in the sense of their traditional type of convergence yet 
become compact in the sense of the weaker type of convergence: 
jg dF n —* jg dG for every g £ Co—the space of continuous func¬ 
tions vanishing at infinity. While G is still a measure, G(R) may 
be less than one and 1 - G(R) may be thought of as that part of 
probability masses which escapes to infinity (or ceases to be observa¬ 
ble). This compactness is in constant use in the traditional central 
limit problem. Thus, even within the traditional setup there is 
constant need to consider “extended” random variables with the 
extended real line and its Borel field. There are many partial 
results for such random variables. While it would be little more 
than an exercise, it may be useful to examine systematically the 
traditional problems and results, in particular the central limit 
problem, for extended random variables. 

Another example of necessary extensions, indicative of techni¬ 
calities involved in Markov theory, is the Martin completion, first 
constructed by Martin for the classical theory of potential then 
transposed by Doob [18] (also Watanabe [86]) to Markov processes 
theory. To simplify, let 9C = (1, 2, . . .), but assigned its discrete 
topology. Let X T be a stationary Markov function with transition 
probability P t {x , S), hence corresponding to a Markov semigroup 
(P £ , t £ T) defined by (P</)(*) = J>*(s, dy)f(y) (it is a series with 
our 0C) on the space of nonnegative measurable functions / on 9C. 
A function/is (P ^-invariant (or “regular” or “harmonic”—accord¬ 
ing to the authors) when Prf = /, equivalently, when J(X t ) form 
a martingale. The problem is to find “representations of such 
functions and of the asymptotic behavior of the sample functions of 
Xy. To simplify, assume that T — (1, 2, . . .) and that all states 
are transient, that is, the “potential” U(x , S ) = t (x, S ) is finite 

for all finite S , and let y be a probability with yS > 0 for all non¬ 
empty S . Form K(x,y) = U(x } [y})/jy(dz)XJ(z, {?/}) and com¬ 
plete 9C with respect to the uniformity induced upon taking for 
Cauchy sequences \y-n\ either those with almost all y n equal or 
those with almost all y n £ S for any given finite S and such that the 
K(x , y n ) form a Cauchy sequence for every x. This is the Martin 
completion and the set 9fll of new points is the Martin boundary . 
3TC is this boundary compact metric space with the measure y on its 
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Borel field ffi. It permits us to describe the asymptotic behavior of 
the sample functions and also carries the representation of har¬ 
monic functions/: There exists a one-to-one correspondence between 
such / and the family of finite measures on ©. Let us mention a 
provocative interpretation of this representation by Blackwell 
(unpublished): He establishes the same correspondence but with 
measures n on the invariant (under shifts of coordinates) <r-field of 
Borel sets in 9C. Also the need for the Martin boundary proper dis¬ 
appears. Why? The connection between the Martin boundary 
and another—the Feller boundary—(and of their applications), 
deserves to be investigated further following Feldman [21]. 

Let us take this opportunity to mention several “exposes d’en- 
semble of Markov theory and of its subfields and applications by 
Dynkin [19a], Ito and McKean [33], and Meyer [61]. 

As a last example, in a different direction, let me mention Kendall’s 
use [36] of Sz-Nagy extension of Hilbert spaces. He observes that 
there exists frequently an excessive measure n on the Borel field § 
of R, that is, pS ^ Jp(dx)P t (x, S). The space L 2 (m) on 9C is a 
Hilbert space with inner product (/, g) = jf(x)g(x)p(dx), and the 
operators P t on H form a semigroup of contractions [65], H is 
imbedded into a larger Hilbert space on which the P t become unitary 
operators with well-known integral representation. It suffices 
then to project them on H. 

5. Probability Spaces. A probability space (12, «, P) is but a 
frame of reference. From a phenomenological point of view it is 
a mathematical fiction. Ideally, what one observes are values of 
random variables, that is, points of a sample space. In practice, 
observations yield only the fact that the values belong to some sets— 
events occur. All one may hope to achieve is a reasonably close 
approximation of the probability law of some random function Xt. 
The “observable” probability space is its sample probability space. 
Then why the underlying probability space? The reason is pri¬ 
marily that of mathematical convenience. The probability space 
is ubiquitous and protean. It may be taken to be any observable 
one. It may be selected to be rich enough to carry any desired 
family of probability laws with any structure. But more is pos- 
sible, as follows. 

Probability algebras . If one does not believe in miracles, that is, 
occurrences of any specified event of probability zero (the’statisti- 




On Stochastic Processes 269 


cians and the physicists do not), or if one desires to avoid the tech¬ 
nicalities such events entail (the probabilists may), these events 
may be eliminated: Identify events whose symmetric difference 
is of probability zero. Then d(A, B) = P(A A B) is a distance 
and the probability space becomes a complete metric algebra of 
events with only the zero element being of probability zero. The 
points w G 0 disappear but then they have little meaning, if any. 
In fact, Kolmogorov [40], Halrnos [29], and others, sought to replace 
probability spaces by probability algebras. However, according to 
the Loomis-Sikorski theorem, every probability algebra can be 
represented by a probability space modulo sets of probability zero. 
Thus, it appears that to start with underlying probability spaces or 
probability algebras is a matter of taste. Furthermore, the fact 
that we may use the Stone space representation provides us with a 
powerful tool, as follows. 

Stone space representation . Given a probability space (0, G, P ), 
the (a -)field of events modulo null events is isomorphic to the field 
$ of clopen sets (simultaneously open and closed) in a compact 
Hausdorff space Q' where is a base for the topology. To each 
Baire set A '—element of the cr-field G' generated by this field there 
corresponds a unique clopen image of an event A (which differs 
from A' by at most a meager set). Then P'A' = PA (and P-null 
sets are meager sets and conversely) defines a probability P on 
Ot/. To equivalence classes of random variables X on 0 there cor¬ 
respond equivalence classes of random variables X on & preserving 
L p - norms, linearity, ordering, a.s. sequential convergence, and so on. 
Thus, whenever convenient, we can select , G r , P f ) as our proba¬ 
bility space. Let me mention a few advantages. 

From a phenomenological point of view, a maximal trial, that is, 
observation of occurrences or nonoccurrences of all possible events, 
ought to determine the “state of nature” w. In mathematical 
terms it would be a maximal filter of events—and the above repre¬ 
sentation has the required property. 

In a different direction, a finitely additive non-negative set 
function on the field of clopen sets is vacuously <r-additive and thus 
determines a measure on G ; . Also, one may form the metric algebra 
(G, d) even when P is finitely additive and, by completing with 
respect to the metric, obtain <r-additivity. From this point of view, 
the quarrel between the proponents of finitely additive probability 
with <r-additivity proponents is of no mathematical importance. 
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On the other hand, this approach may be of importance for (sub) 
martingales, say (X n ^ 0, On). The limit = lim <p n is finitely 
additive on WO*. If it is not <r-additive, use the foregoing methods 
to extend it to a measure. Then investigate the asymptotic 
behavior of the sequence X n ; similarly for general (sub) martingales 
&«)• 

I know of only one specific probabilistic instance where the Stone 
space representation was exploited. Doob [18a] proved the Rota 
[78a] limit theorem on bistochastic operators using a correspondence 
with Markov endomorphisms: Let T be a linear positive operator on 
Li(Q, (i, P) to itself, with Tl = 1. To T there corresponds T' 
of same type on L'(S2', a', P ') defined by T'(f') = ( Tf )'—the image 
of Tf in the above representation. Then T' is a Markov endo¬ 
morphism in the sense that there exists a transition probability 
P'(x',A‘) with (770(O - J>'(z', dy')f'(y f ) for almost every 
This representation may perhaps be exploited for ergodic theory. 

C onditional probability spaces. Applications of probability theory 
to integral geometry and to the theory of numbers seem to point 
out that the fundamental concept of one underlying probability P 
is too restrictive. There appear nonfinite measures y on <r-fields 
and the problems require at once the use of families of “conditional” 
probabilities P c defined by P C A = yAQ/yC for the sets C with 
yC < o o. To take care of this need, Renyi [72] introduced con¬ 
ditional probability spaces (£2, Gt, P C) 6) with conditional proba¬ 
bilities Pc on classes 6 C ^ as the primary datum. They are 
axiomatized by 

(i) Pc are probabilities on d with PqC = 1. 

(ii) If A and B with P C B > 0 are events and BC £ <5 then 
PbcA = PcAB/PcB. 

Note that for C = <2, the Pc are traditional conditional probabilities 
given C derived from one probability P = Pq. 

So far, only the traditional normal convergence and approxima¬ 
tion together with the law of iterated logarithm were used in the 
applications to number theory. I shall be content to refer to the 
book by Koubilious [44] and the exhaustive literature quoted 
therein. It would be of interest to transpose systematically the 
various stochastic structures and their properties to Renyi setup 
and its particular cases. 
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Expectation spaces . To conclude, let me indicate rapidly new 
fissures which, become visible when probability concepts are used 
in quantum theory. There appear two families of (abstract) ran¬ 
dom variables—“observables” X and “observables” F—which are 
self-adjoint operators on a Hilbert space of “state functions” 
y, z) and which commute within each family only. Their 
observable “values” are their eigenvalues and their probability 
laws in a state \f/ are given by the densities $XiJ/ and How¬ 

ever, there are no probability laws for pairs (X, Y) of noncommuting 
operators and the tranditional interpretation of all X and Y as the 
space of random variables on a common probability space disap¬ 
pears. Various ways to detour this obstacle were given by von 
Neumann, Segal, Wiener, and others. But one may try to “absorb” 
this obstacle by enlarging accordingly the traditional setup of 
probability theory, as follows [52]. 

Since in quantum theory the concept of expectation with its 
usual properties remains valid, this leads to the introduction of 
expectation (in lieu of probability) as a primary datum. One may 
use, say, the following “expectation spaces” setup. On a measura¬ 
ble space (9C, ft) define “abstract” random variables X to a common 
space, with the structures required below, either directly or, pro¬ 
ceeding as in the numerical case, introduce “abstract” indicators 
I A with I a 2 = I a then simple or elementary “abstract” random 
variables and then general random variables X, FormaMy } X — 
jx(u)I(du ) where x(u) is a real or complex-valued ft-measurable 
function and I(du ) is a “decomposition of the identity indicator 
on ft. There may be several such measurable spaces (9C, a), (% (B), 
and so on with corresponding random variables X, F, ... to a 
common range space. Introduce the expectation E: a positive 
linear operator on “positive” random variables X, F,Z, ... to 
the (extended) real line or, more generally, to the (extended) com¬ 
plex plane with El = 1 and I?(lim Z n ) — lim EZ n for Z n | Z, and 
extend by linearity; formally EX = fx(u)EI(du). For example, 
formally the traditional setup corresponds to one family of random 
variables x = jx(u)I(du) with finite measurable functions x(u) 
and a common resolution of the identity indicator I(du ) on (9C, ft) 
with correspondence P(du) = EI(du) between the expectation 
operator and a probability measure P on ft. Or I(du ) may be a 
resolution of the identity operator on a Hilbert space. The expec¬ 
tation operator E and a self-adjoint positive-definite operator 
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whose trace is one—a “probability operator P” correspond by 
EX = trace PX; in fact, there is a probability measure on Q, deter¬ 
mined by trace ( PI(du )). Or there may be two resolutions I(du) 
and J(dv) of the identity operator on a Hilbert space and cor¬ 
responding families of random variables X = jx(u)I(du) and Y = 
fy(v)J(dv). In general, the self-adjoint operators X and F do not 
commute and Z = XY are not self-adjoint. However, (R(A'l') = 
(XY + YX)/2 and ( XY ) = (XY — YX)/i are self-adjoint and 
hence correspond to resolutions of the identity. There still exists 
a “probability operator” P such that, say, EZ = trace (P(R(XY)) + 
i trace (P(XY)). The operators with a common resolution H(dw) 
of the identity are, in fact, defined on a common probability space 
with probability measure given by EH(dw). 

The foregoing “expectation spaces” setups may be of phe¬ 
nomenological interest and contain the traditional and the quantum 
theory setups. Thus, it would be of interest to investigate them 
systematically and search for transpositions of the various stochastic 
structures and properties. 
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Random Integrals 
of Differential Equations 

J % KampS de F6riet 


INTRODUCTION 

At the beginning of our brief account of the present status of 
research on random integrals of differential equations, we would 
like to point out that the mathematical theory, which has developed 
in the last three or four decades, has its motivation in physics; this 
physical background is quite apparent, even in the most abstract 
parts of the field. The search for a theoretical model of the Brown¬ 
ian motion of a particle and the turbulent flow of a fluid has quite 
naturally led to the definition of random integrals of ordinary or 
partial differential equations respectively; in this last domain an 
extension of the statistical mechanics of holonomic systems to con¬ 
tinuous media has given a definite orientation to the whole develop¬ 
ment. This is the only excuse that an applied mathematician 
(who has devoted a large part of his research to fluid dynamics) can 
use in discussing these mathematical problems, which are becoming 
more and more abstract every day. 1 

1 As an illustration of this trend, compare Paul L6vy [G.1.3J and E. B. Dynkin 
[G.1.2] exposition of the theory of Markov processes. 
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In an attempt to make this survey as self-contained as possible 
and for the benefit of readers not familiar with probability theory, 
we have condensed in an Appendix the definitions and properties of 
random functions which are extensively used in the research we 
are reviewing. 

The asterisk (*) after a word indicates that the word is defined 
in the Appendix. 

PART A 

L Statistical Mechanics of Holonomic Systems . It is an interesting 
historical fact that random integrals of ordinary differential equa¬ 
tions were used extensively twenty years before the theory of ran¬ 
dom functions had its first mathematical formulation. 2 J. W. 
Gibbs [A. 1.1] considers mechanical systems 2 which are holonomic, 
that is, their state can be defined by a finite number of coordinates: 
the k configuration parameters: 


Qb ?2, . . . , Qk 

and the k conjugate moments: 


Ph V 2 , • • . , Pk. 

Thus, a state of 2 is represented by a point w in the phase space z Q; 
the motion of 2 is defined by the 2k Hamilton-Jacobi differential 
equations: 


dqj _ dH dpj 

dt dpj dt 



where H (the Hamiltonian) is the total energy E of 2 (kinetic + 
potential). As a rule one supposes that 2 is conservative , that is, 

* The fundamental paper of J. W. Gibbs was published in 1902; and the first 
random function* was introduced in a rigorous mathematical setting by 
N. Wiener in 1923; moreover, for a system in statistical equilibrium, the 
integrals are stationary random functions*; nevertheless it was not until 1934 
that a clear mathematical definition of a stationary random function was given 
by A. N. Khintchine. 

* As a setf Q can be considered a subset of R zk , but the natural metric on 0 is 
not Euclidean . The configuration space (qi; . . . ; q k ) is a Riemannian 
variety whose metric is defined by the kinetic energy of 2; the phase space fi 
itself has to be deduced from the configuration space by a construction described 
in [G.3.1, pp. 34-37]. 
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H is a function of the ( qj , pj) but not of the time t . Under some 
smoothness conditions for H (which are satisfied for all the systems 
under consideration in rational mechanics), one can prove an exist¬ 
ence, uniqueness, and continuity theorem for (1.1.): between the 
initial state co of 2) at t = 0 and its state cot &t any time t } °o < 
t < -f oo , there is a one-one bicontinuous correspondence 

CO <—► COt 

which can be expressed in the form 

(1.2) <*>t = T t o>. 

The T t define an Abelian group of homeomorphisms of S2 onto itself, 
the property 

T t+ . - T t T 9 = T a T t 

expressing the Huyghens principle for 2. 

For a given w, the set of all 7Vo, where t ranges over the reals, 
defines a trajectory (or orbit) r(c*>); any phase function F(u>) constant 
along each trajectory is a first integral of (1.1). 

Now the statistical mechanics of 2 is based on the assumption 
that the initial state co is chosen at random in 0 according to a given 
probability law P 

(1.3) Prob [co E A] = P{A ) 

(when A belong to a prescribed <r-algebra of subsets of 0). S being 
conservative, one is especially interested in statistical equilibrium, 
thus one considers only probability measures P invariant under 
the group of homeomorphisms T t 

(1.4) P(T t A) = P(A). 

Thus any phase function F(co) (P measurable) is a stationary 
random function* and all the statistical properties of the motion 
are independent of the time t y that is, the mean or expectation* of any 
phase function F(co) £ P(^ P), E(F) is independent of L In par¬ 
ticular, the 2k integrals of the equations (1.1) [qj(T t o)) } Pj(T t o))] are 
stationary random functions. Thus the statistical mechanics of 
Gibbs really dealt with random integrals of the Hamilton-Jacobi 
equations . 

This fact is obscured by the language used by Gibbs and by most 
present-day physicists. They consider a very large ensemble of 
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N identical copies 2i, . . . , 2# of 2, moving independently of each 
other (there is absolutely no interaction between two systems 
and 2*! No collisions as in the kinetic theory!), and suppose 
that the number Na of systems whose states are represented by a 
point w in A is given by 

(1.4) ^ = P(A). 

In this context statistical equilibrium simply means that Na. is 
independent of the time t. The cloud of points «i, . . . , u N dis¬ 
tributed in ft according to (1.4) might seem more concrete to the 
physicist than the probability P(A), but it must be pointed out 
that it is only an approximation of the strong law of large numbers: 

(1.5) Prob T lim ^ - P(4)l = 1, 

Lv-» + =o N J 

the location of the points «i, . . . , & N in S2 being assimilated to a 
sequence of N independent trials. The idea of an ensemble intro¬ 
duced by Gibbs (1902) found a rigorous justification only in 1930, 
when the proof by A. N. Kolmogoroff of the strong law gave a 
precise mathematical meaning to (1.4), 

Moreover, among all the admissible probability laws P, Gibbs 
chose a particular one, the canonical distribution . P is absolutely 
continuous with respect to the Lebesgue measure (based on 
do) — dq i . . . dq k dpi . . . dp k , which is invariant under T t by 
Liouville's theorem) and one has 

(1.6) dP = Ke~« E da>. 

Here also the justification of the canonical distribution had to 
wait several decades (1950): for a given value of the mean total 
energy E } it is (1.6)' which maximizes the information entrovv of 
2 [A.1.2]. 

Nevertheless, the introduction, though in an implicit and even 
obscure way, of the random integrals qj(T t o}) y p^Ptw), was a clear 
indication of the role that random integrals of differential equations 
were to play in many physical problems. 

Let us point out that the stationary random functions F(T t o)) 
introduced in statistical mechanics define, in a sense, the most 
general type of stationary random functions. J. L. Doob [G.1.1, 
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p. 509] proved that any stationary random function, defined on a 
probability space (£2, g, P), can be put into the form P(7Vo), where 
T t is a group of suitable measure-preserving set transformations 
(operating on the sets of the cr-algebra g). 

2. Ordinary Differential Equations . In Gibbs statistical mechanics 
randomness is introduced only through the initial conditions, but 
many dynamical problems give the impression that some kind of 
forces are acting at random. Thus, in the Hamilton-Jacobi equa¬ 
tions, the Hamiltonian H has to be taken as a random function 
of the ( pj , qf). In many cases this point of view has inspired 
fruitful research (for an example, see N. M. Krylov and N. N. 
Bogolyubov [A.2.14]). 

One is thus led to the general problem: 

Given a q-dimensional random vector field F ( t , x, co) find a q-dimen - 
sional random vector x (£,co) satisfying the q differential equations: 

dx 

(2.1) — - F(*, x, co) 

and taking a given initial value x (0, co) = x 0 or even a random initial 
value x (0, co) = Xo(co). 

If q = 3 and if F = U represents a random velocity field, this 
problem is the exact mathematical formulation of the diffusion 
problem in the turbulent flow of a fluid: the velocity U being a 
known random function of the time t and the position x (Eulerian 
description of a flow), find the random trajectories of the particles 
of the fluid (Lagrangian description). In spite of its great impor¬ 
tance one is still very far from the solution of this problem for a 
large class of random vector fields F. In fluid dynamics one has to 
be content with partial results. For instance, if the mean vector 
P[F(<, x, w)] and the covariance tensor* E[F(t, x, co)F(s, y, co)] are 
known, what can be said about the mean vector E\x(t, co)] and 
the covariance tensor E[x(t f co)x(s, co)]? Even in this limited field, 
very few results have been obtained so far (see, from the viewpoint 
of fluid dynamics, G. K. Batchelor [A.2.1, A.2.2]). 

The only general results known are concerned with stationary 
random functions and differential equations with constant coef¬ 
ficients. K. Karhunen [A.2.13] has proved the following theorem. 

If the equation a n £ n + • * * + «i£ + a 0 = 0 has no purely 
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imaginary root £ = in, and if y(t , co) is a normal stationary random 
function with an absolutely continuous spectrum 

d& y (ic) — &y(ic) die, 

then the differential equation 

( 2 - 2 ) °*» + ' ■ ' + a i 37 + ®o a: = 3/(*» «) 


has a unique solution which is a normal stationary random function 
with the absolutely continuous spectrum* 


(2.3) 


( \ = _ &v(k) dK _ 

\a n (iK) n + • • • + ai(u) + a 0 | 2 


The normal stationary random function y(t } co) being represented by 
a Wiener integral* 


(2.4) y(t, co) = f+J e iu Vs/(*) dW(.K, «), 

then the random integral of (2.2) can be represented by 


(2.5) x(£, co) 


/*+« 

J — ao ®n(^) n 


V gj/QQ dWQc, co) 


+ 


+ ai(zV) + a 0 


Both integrals are interpreted by Karhunen as limits in the mean; 
but one can sharpen the result by using criteria for the almost sure 
existence and continuity of the derivatives up to the order n of a 
normal stationary random function. In fact, if Z v {k) satisfies 

/_t" [log (1 + M)] 1+ ' dS v (K) < + =0 e > o, 


then (G. Hunt [A.2.12]) y(t f co) is almost surely continuous. 

But now (2.3) implies: 

/_f « 2 ”[ lo g (! + M)] 1+ ‘<M*) < +«. 

Thus, due to the criterion 4 of Yu. K. Belyaev [A.2.3] x(t, co) is almost 
surely continuous and has almost surely continuous derivatives up 

4 In Belyaev's formula e is omitted; this contradicts G. Hunt’s criterion in the 
particular case n — 0. J. Delporte, by a completely different method, has 
established an equivalent criterion, but he finds the exponent (1 + € ) as was 
to be expected [A.2.11]. 
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to the order n. Thus one can interpret (2.4) and (2.5) (see J. 
Kamp 6 de FSriet [C. 2 , C.3]) as almost surely existing and (2.5) 
defines the random integral of ( 2 . 2 ) for almost all samples. 

For nonlinear equations we know of few results. One interesting 
case was considered by R. H. Cameron [A.2.9, A.2.10] 

(2.6) 37 = F[t, x(t, u) + W(t, «)] *(0, o>) = 0. 

at 

The author gives two sets of conditions on F(t, u) in the strip 
{ 0 <£< 1 ,“°° < w < + 00 } sufficient for the almost sure 
existence of a random integral x(t, co) in the interval 0 < t < 1 . 
Let us observe that if one puts 

y(t , <o) = x{t y w) + W(t, w) 

then ( 2 . 6 ) is equivalent to the integral equation 

y(t, co) - /; F[s, y(s, co)] ds = W(t, co). 

It is interesting to note that for F(t, u) = w 2 , the conditions are 

not satisfied and it has been proved by D. A. Woodward [A.2.9, 
p. 840] that, in fact, for almost all co y(t, co) does not exist. 

Of course the general equation ( 2 . 1 ) can also be written as a 
Volterra integral equation 

x(<, co) = Xo(0, “) + J Q F[s, x(s, co), co] ds 

which shows that the research of A. T. Bharucha-Reid [A.2. 6 , A.2.7, 
A. 2 . 8 ] on random solutions of integral equations can be useful in this 
field. 

3 . Stochastic Differential Equations. In sharp contrast to the 
scanty material of the preceding section, when we come to the 
stochastic differential equations we find an exuberant wealth of 
material which will keep us busy for the next four sections. 

The initial nucleus was a mathematical model of Brownian 
motion. In 1828, a botanist, Robert Brown, observed that very 
pma .11 solid particles suspended in a liquid perform completely 
erratic motions. From further observations it became increasingly 
clear that these irregular motions were an outward manifestation 
of the motion of the molecules of the liquid. The turning point 
came in 1905 when A. Einstein proved that the projection of the 
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particle, let us say on the Ox axis, did follow a normal law with 
mean x = 0 and variance p = <rH. Seldom has a formula so 
simple had such far-reaching consequences and become so popular. 
In fact, it made possible the determination of the Avogadro Number 
(an accomplishment for which Jean Perrin was awarded the Nobel 
Prize in 1926). In 1916 M. Smoluchowski gave a confirmation of 
Einstein s ideas and improved on several points in his theory. 
Then, in 1923, N. Wiener fA.3.15] gave a rigorous definition 6 of a 
random function W(t, a>), which manifests the most typical features 
observed in the Brownian motion: 


(a) The displacement Ax of the particle during the interval 
[t, t + r] being produced (if r is large enough) by a very large number 
of molecular impacts must, as a sum of a large number of inde¬ 
pendent random variables, follow the normal law (central limit 
theorem). This is true if Ax = <r [W(t + r, «) - W(t, «)], even if r is 
small (this last circumstance is possibly annoying for the physicist!). 

(b) The displacement during a given time interval depends only 
on the molecular impulses acting during this interval. Thus the 
displacements Aia; and A 2 x during two nonoverlapping time inter¬ 
vals [t h ti + rj], {t 2 , t 2 + r 2 ] must be independent. In fact, this 
property is satisfied by Ax x = <r[W(h + r h «) - W(t h «)] and 
Ax 2 = v[W(h + T2, w) - W(« 2 ,w)]. 

Thus the random function 


( 31 ) X = <rW{t, u) 

is a possible choice for theoretical model of the Brownian motion 
of a particle on the Ox axis; obviously, the Einstein formulas are 
verified by the displacement W(t + r, ») - W(t, «) during any 
mterval of time [ t , t + r]. 

But some properties of the random function W are rather puzzling 
for the physicist. First, W(t, u) almost never has a derivative with 
respect to t. Thus the velocity of the particle is not defined. Second, 
W(t, «) is almost never of bounded variation; the length of the 
path during any finite interval of time is infinite. 


6 In fact > t ^. IS function was already considered in 1900 by L. Bachelier [A.3.1.1 
m quite a different context. He was looking for a mathematical model of the 
fluctuations of the stock exchange! Many of the characteristic features of 
1F(<, u) were discovered by L. Bachelier, but without rigorous proofs. The 
theory of probability was in its infancy at that time and one had to be satisfied 
with rather vague heuristic considerations. 
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In order to avoid these paradoxes, another approach has been 
suggested by Omstein and Uhlenbeck [A.3.14] and later by S. 
Bernstein [A.3.3]. The position of the particle (starting from 0 at 
time 0) is now defined by 

(3.2) x(t, «) - <r f* e-^Wie 2 ^, «) d£. fi > 0. 

The velocity u(t, co) is well defined at any time t by 

(3.3) u(t, co) = <re~ fl *W(e 2fit , co). 

This random function is normal, almost surely continuous in any 
finite interval, stationary with mean 

(3.4) E[u(t y co)] - 0 
and covariance 

(3.5) E[u(t, a)u(s, «)] = <r 2 e _ ^ u_s| 


and spectrum 

(3.6) 


d&u 


<r 2 2/3 da 

2 i /d2 f 
T K + P 


moreover, it is a Markov process * 

In order to legitimize the model (3.2) for the Brownian motion, 
let us write the equation of motion of the particle on the Ox axis 

du . *. v 

(3.7) — = —&u + y(t, co). 

The influence of the liquid is split into two parts: a systematic 
force, the friction — j 8u (Stokes law); a fluctuating part, the random 
force produced by the molecular impacts. If we take 


y(t, o) | r) = - [W(t + r, co) — W ( t , co)], 

T 

then the force y is a normal stationary random function and (3.7) 
is a particular case of (2.2). Thus there is a unique normal station¬ 
ary random integral u(t, co); the spectrum of y (t, co | r) being 
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the spectrum of u ( t f co) is given by 


(3.8) 


d& u — 


2 tr 2 (1 — COS kt) 

~V (k 2 + /3 Vr 2 


Now if r —> 0, then —» — c?#c, which does not define a spectrum 

T 

[this means that y(t , a> | r) does not tend to a limit; W has no 
derivative], but 


d8f U 


a 2 die 
7r k 2 + 0 2 


which is precisely the spectrum of the velocity in the Ornstein- 
Uhlenbeck process. This limiting property of equation (3.7) is 
usually expressed by the symbolic formula 

(3.9) du = -fi udt + adW 


called Langevin’s equation. One must carefully avoid dividing by 
dt because dW/dt has no meaning. Formula (3.9) is a symbolic 
substitute for the integral equation 

(3.10) u(t, a) - u(o, u)- fi j* u(s, «) ds + <rW(t, «) 

One finds in J. L. Doob [A.3.4] a rigorous mathematical theory of 
Langevin’s equation, 8 from this point of view. 

The generalization of Langevin’s equation introduced the fol¬ 
lowing standard type of stochastic differential equation: 

(3.11) dx = m{t, x) dt + <r(t, x) dW. 


The meaning of the function m and <r is given (J. L. Doob [G.1.1, 
p. 275]) by the limits of two conditional expectations: 


.. j;,\ X (t + h, to) — x(t, w) „ 1 

bxn E [- - -^ x(t, «)-*]- m(t, f) 


lim E 


[x(t + h, oj) — x(t, co)] 2 
h 


x(L 


, ") = £ j 


r(t, £) 2 . 


The fact that, when x(t f <o) is known, the mean and variance of the 


# S. Bernstein [A.3.3] defines Langevin’s equation as the limit of a finite differ¬ 
ence equation. M. Kac [A.3.8] gives a theory of Langevin’s equation and of a 
more general equation (with external forces, such as gravity, acting on the 
particle) as the limit of a random walk. 
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increment x(t + h, o>) — x(t f of) are both of the order of h explains 
why P. L4vy [G.1.3] always writes the stochastic differential equa¬ 
tion (3.11) in the following way: 

dx — m(t f x) dt + <r(t , x)X a/ dt, 

X being, for each f, a normal random variable with mean 0 and 
variance 1. Under some regularity conditions for m and a , the 
solution of (3.11) is a Markov process almost surely continuous. 
The effective construction of Markov processes from stochastic 
differential equations has been an outstanding contribution of Paul 
Levy’s to the theory of random functions. 

Here let us make an amusing semantic point: the denominations 
“random functions” and “stochastic (or random) processes” are 
both used in the literature; the choice of one or the other by an 
author is motivated by his intuitive approach to the theory. 

Let us consider a given set of functions x(t, o>) depending on a 
random parameter. If we pick a particular sample, at random, 
we have at our disposal a function which is already known in its 
whole interval of definition: “random function” clearly expresses 
this idea. If we start from a stochastic differential equation, we 
assume that we proceed to build the function step by step. When 
we know it up to a given time t } the equation defines the increment 
Ax for the next step; the word “process” expresses perfectly this 
knowledge progressing continuously from t to t + At (see [G.1.3,] 
pp. 27 and 28]). This explains the reason why Paul lAvy has 
always remained faithful to “les processus stochastiques and why 
we, having been introduced to the field through statistical mechan¬ 
ics (a set of given trajectories), are inclined towards “les fonctions 
al6atoires.” 7 

The main result on the standard type of stochastic differential 
equations is due to K. Ito [A.3.7]. He makes the following 
hypotheses: 

(Hi) m(t , £) and<r(J, £) are Baire functions in the strip 

{(t, £):0 < t < 1, -oo < £ < +«>}. 

7 Let us point out that we are not always consistent; for instance, we say 
“Markov process,” bowing to common usage. To be self-consistent in our 
terminology, we should call them “random functions of the Markov type 
(as does, for instance, R. Fortet [A.4.2]); we hope that the reader will pardon 
this venial sin. 
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(H 2 ) In this strip 

N*, Q\ < K (i + a*)* 

0 < *) < K(l + 

K constant. 

( H 3 ) In the strip there hold the uniform Lipschitz conditions 

h(U*) - m(t, ^)| < K\h - h| 

k(<, (2) -*((, h)| <tf|h-h|. 

Then replacing (3.11) by the integral equation: 

x(t, to) - *(0, to) = f* m[t, x(s, »)] ds + f* a[t, x(s, »)] dW(s, to) 

(the second integral is a new type of stochastic integral, whose 
meaning has been previously defined by K Ito [A.3.5]), K. Ito gives 
an effective construction (by successive approximations) of the 
unique solution x(t, to), which is an almost surely continuous 
Markov process if t £ [ 0 , 1 ]. Moreover, for each t £ [ 0 , 1 ], 
x(t, co) - x(0, to) is independent of all the increments W(s + h, J) — 
W(s, «), if t < s < s + h < 1 . 

In a recent paper A. V. Sokorod [A.3.13] has split tne existence 
and uniqueness theorem into two parts; for the existence of an 
almost surely continuous Markov process in [ 0 , 1 ] the conditions 

(Hi) m(t, £) and a(t, £) are continuous in the strip 
{H^) as above 
are sufficient. 

4. Transition Probability and Diffusion Equations. In the pre¬ 
ceding section the construction of the Markov process was based 
on the direct consideration of the stochastic differential equations, 
but there is another approach which has inspired a large amount of 
research. This other method defines x(t, u) by the transition 
probability* P(s, £, t, y). It is supposed that P satisfies a kind of 
Lindeberg condition 

dp ( s > £> l > n) = p rob [\x{t, to) - x(s, a)I > e[ x(s, to) = £] 

= 0(< - s) 0 < s < t 
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for every e > 0. The functions m(t, £) and <r(t, £) are now defined 
by 

m(t, $) = lim —^— [ (v ~ i) ^P(8, (, t, y) 

«-.)io t ~ s J |,-{|>. 

<r(t, {) 2 = lim f (v - £) 2 d,P(s, $, t, i)) 

In 1931 A. N. Kolmogoroff [A.4.5] proved that under suitable 
regularity conditions, the transition probability satisfies the back¬ 
ward diffusion equation (with respect to the past conditions s, £) 

<41) £ + m(s - 0 % + \ *<*■ f) ’ w - 0i 

and the forward equation (with respect to the present conditions t } rj) 

(4.2) ~ ^ [m(<, »j)p] - ^ [<r{t, v) 2 Pi = 0, 

where p(s, £, x,t) = — P(s, x, t). 

dri 

The first is a parabolic equation with respect to the variables 
(s, f) for s < t; (t, v) enters only from the initial condition 

(4.3) P(t, t, v) = 1 € < V 

= 0 y- 

The second is also a parabolic equation with respect to (f, n) for 
t > s; (s, £) enters only from a similar condition 

(4.4) P(s, {,«,?)- 1 v > £ 

= 0 ij < £ 

In particular cases the forward equation had been known a long 
time previously. L. Bachelier [A.3.1] used the heat equation 
(m = 0, <r = 1) extensively in connection with Wit, to). A. Einstein, 
M. Smoluchowski, Fokker, M. Planck, and others, considered the 
case m = constant, a = constant. 

W. Feller [A.4.1] proved, under suitable regularity conditions 
for m(t, £) and a{t, £), that P(s, $, t, n) is uniquely determined by 
equations (4.1) and (4.2) and the initial conditions (4.3) and (4.4). 
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R. Fortet [A.4.2] has established the almost sure continuity of the 
Markov process defined by P(s, £, t , 97) when Feller's conditions are 
satisfied; more precisely he has shown that, almost surely 

\z(t", «) - x(t', «)| < 2C [| t" - t'\ log J 

for all C > 1, provided 1 1" — t r \ < 5 ( 00 ), the random variable 5 
being independent of t f and t ", 

R. Fuchs [A.4.3, pp. 192 — 193] refined the properties of the modu¬ 
lus of continuity of x(t , co) and proved a proposition which is, in a 
sense, the converse of the preceding result, where starting from given 
m (l> £)> £)> concludes to the existence and continuity of 

x(t } co). R,. Fuchs proves that for every Markov process almost 
surely globally continuous (globally meaning a somewhat stronger 
kind of continuity, which he defines) the functions m(t, $), a(tj |) 
exist for almost all t in [0, 1]. 

5. Semigroups and Infinitely Divisible Laws. In the stochastic 
differential equation (3.11), the random impulse is represented by 
dW) in doing so we keep the spirit of Langevin's equation for 
Brownian motion. But it is easy to observe that, for the construc¬ 
tion step by step of the Markov process x(t 7 co), the most important 
feature of W(t, co) is not the normal law but the independence of its 
increments. This points to a much more general type of stochastic 
differential equation, 

(5* 1 ) dx - m(t> x) dt + <r(t } x) dV , 

where the random impulse dV is now generated by the most general 
random function V(t , w) with independent increments. W(t , co) is 
simply a particular case of this class of random functions. Another 
well known but quite different example is the Poisson process (the 
samples are step-functions). 

P. L6vy (see, for example, [G.1.3, pp. 147-203]) has proved that, 
after the subtraction of a suitable given function (not random), the 
function V(t, <o) can always be reduced to a random function with 
independent increments having no fixed singularities (centering of 
the process, according to modern nomenclature). For this last type 
the increment V(t 7 «) — V (5, co) must follow an infinitely divisible 
law. The general structure of these laws was discovered by P. L6vy; 
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see [G.1.3, p. 161], where the general expression of the characteristic 
function of an infinitely divisible law is given explicitly. 

For the separable version of V ( t , o>) almost all samples are 
bounded on every finite interval, a < t <b and their discontinuities 
are jumps. 

The fundamental existence and uniqueness theorem has been 
proved by K. Ito [A.3.7] explicitly for the generalized standard type 

(5.1) , Under the conditions (Hi), (H 2 ), and (H 3 ) the solution 
x(t , co) is still a Markov process, but the samples are almost surely 
discontinuous. 

As in the particular case of Section 4, the determination of the 
transition probability has been the subject of extensive research; 
W. Feller [A.4.1] has proved that the backward diffusion equation 

( 4 . 1 ) becomes an integro-differential equation which reduces to a 
partial differential equation only in the continuous case. 

The composition law of the transition probability has opened the 
door to the application of the theory of semigroups . Since the 
fundamental paper of K. Yosida (1949) [A.5.8] considerable work 
has been done and this has become one of the most fashionable spots 
in the field of Markov processes (W. Feller (1954-1955) [A.5.3, 
A. 5 . 4 ], E. Hille and R. S. Phillips (1957) [A.5.6], A. Neveu (1958) 
[A.5.7], E. B. Dynkin (1959) [A.5.1, A.5.2] to mention only a few.) 

To give one example, we refer the reader to pp. 648-660 of E. Hille 
and R. S. Phillips [A. 5 . 6 ] where semigroup theory is applied to the 
particular case of Markov processes homogeneous in space and 
time, that is, 

(5.2) P(s, £, t, 17) - Pit - s, 17 - £)• 

Through the Chapman-Kolmogorov* equation, written as a 
convolution, 

P(h + t 2y x) * P(t h x) * P(t 2} x), 

the transition probability is interpreted as a semigroup of trans¬ 
formations acting in L(—«>,+<»). The choice of this function 
space is motivated by the formula 

P(t> •)F(x) = Fix - 17) dyfPityTj) 

giving the probability law of the process at the time t , when the 
initial probability is defined by F(x). 
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Using the general methods of semigroup theory, the general 
expression of the characteristic function <p(t, z ) corresponding to 
P(t, x) is found, giving a new proof of Paul Levy’s main result on 
indefinitely divisible laws: 


(5.3) 


1 

t 


log <p(t, z) 


imz — 




izx 

TTx*. 


1 + x 2 


dGo(x), 


where in and <j are real constants (<r > 0) and G a is bounded, never 
decreasing and continuous at x = 0 (the case G 0 = 0 corresponds 
to the normal law). Moreover, for any function / such that /, f 
are absolutely continuous and /, f" E L(- «>, + «,) the infini¬ 
tesimal generator A (in the strong topology) of the semigroup of 
transformations P(t, •)/ has the following expression: 


(5.4) [Af](x) = -mf(x) + f'(x) 

A 




+ y 2 

dG 0 (x). 


This gives a new interpretation of the forward diffusion equation 
(4.2) in the particular case G 0 = 0 and shows clearly that in the 
general case of (5.1) the forward diffusion equation has to be replaced 
by the integro-differential equation 


ft = 


The great importance of the infinitesimal generator (strong or 
weak) is due to the fact that it completely determines the Markov 
process [A.5.1, p. 31]. 

Most of the results obtained in this way are of great importance 
for the theory of Markov processes, but are not within the scope of 
the present survey, having no direct bearing on the stochastic dif¬ 
ferential equations. 

6 . Generalizations. As a generalization of ( 5 . 1 ) one can consider 
the system of q stochastic differential equations 

(6.1) dx = m (t, x) dt + S(t, x) dV 
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where x and V are g-dimensional random vectors 

X = [; Xi(t , (O), . . . , Z q (t , 0))] 

V- . . . f F,a»)]; 

the § random functions Vj(t, co) having independent increments; the 
vector m(2, 0 and the matrix S(t, 0 being defined on the (q + 1)- 
dimensional set 0 < t < 1, ? £ 

Both methods—direct construction of the ^-dimensional Markov 
process x(£, co) and determination of the transition probabilities from 
the (generalized) Kolmogorov’s equations—can be used without any 
essentially new difficulty (see [A.6.1]). 

A very interesting approach from the physical viewpoint considers 
stochastic differential equations with vanishing diffusion, that is, 
the system 

dx = m (t, x) dt + eS(t y x) dW 

where the parameter e is small. The method of Yu. N. Blago- 
veshchenskii [A.6.1] is based on an asymptotic expansion of the 
vector x(t f co) in a power series in «; the first term being the solution 
of the system of q ordinary differential equations corresponding to 
€ — 0. Curiously enough, the first approximation is Markovian, 
but not the A;th for k > 2. 

1.1. Gihman [A.6.3] has broadened the meaning of (6.1), dropping 
the hypothesis of independence for the increments Vj(t, co) — 
Yj($, co). He also supposes that the dVj depend not only on the 
time t, but also on some space variable (eventually even infinite 
dimensional). This last hypothesis is motivated by the considera¬ 
tion of dynamical systems under the influence of random disturb¬ 
ances acting differently at different points of the phase space fl. 
The integral of the system (6.1) is thus defined (in the spirit of 
S. Bernstein [A.3.3.]) as the limit (in the sense of convergence in 
mean square) of a sequence of functionals defined by recursive 
formulas. 

A completely different type of generalization is obtained in the 
following way: In the preceding sections the Brownian motion func¬ 
tion and, more generally, the random functions with independent 
increments took their values in R first and, now in this section, in 
R q ; but the Brownian motion has been extended successfully to 
non-Euclidean space (see P. L6vy [G.1.3, pp. 194-203]). In par- 
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ticular, the general structure of the functions with independent 
increments is completely determined for the circle (pp. 186-191) and 
for the sphere (pp. 202-203). Thus a new type of stochastic dif¬ 
ferential equations can be considered where the random function 
x(t , a?) now takes its value in a non-Euclidean space. 

Along this line K. Yosida [A.6.4] has extended the theory to the 
forward and backward diffusion equations for the transition proba¬ 
bilities in a connected domain of an ra-dimensional orientable 
Riemannian space. 

PART B 

7. Statistical Theory of Turbulence . It has been known, at least 
since the beginning of the nineteenth century, that the flow of water 
from a reservoir through a long circular pipe can have two com¬ 
pletely different aspects; if the flow is slow enough, the trajectories 
are parallel straight lines and the velocity at a given point remains 
steady and varies smoothly from one point to another. If, however, 
the flow is fast enough, the trajectories become extremely compli¬ 
cated and the velocity displays variations in space and time which 
change unpredictably. Moreover, it has been found (H. Poiseuille, 
1851) that in the first case, that of laminar flow, the experimental 
measurements are in very good agreement with the theoretical 
results deduced from the Navier-Stokes equations (1822-1846) 
for incompressible viscous fluids. On the contrary, in a turbulent 
flow the loss of pressure along the pipe can be 100 or even 1000 
times greater than the value predicted from the equations. It was 
recognized as early as 1872 by J. Boussinesq that the chaotic nature 
of the flow was such that ignoring the fluctuations one must deal 
only with averages . This remark is the beginning of the statistical 
theory of turbulence. O. Reynolds (1895) went further; by averag¬ 
ing the Navier-Stokes equations, he gave the equations connecting 
the mean values of the components uj of the velocity field and the 
mean values of the products ujUk (Reynolds tensor). It must be 
borne in mind that for J. Boussinesq and even today for most 
physicists, the averages are always taken with respect to time t 
(at a fixed point x) on a given realization of the flow (a sample of the 
ensemble of flows; see F. N. Frenkiel [B.7.4] for a careful comparison 
of the different types of averages commonly used). It was only 
much later, around 1930, that it was understood that, as in J. W. 
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Gibbs’ statistical mechanics of holonomic systems, one must com¬ 
pute the averages over a very large ensemble of possible realizations 
of the turbulent flow. In the language of probability we say that 
the velocity field is a random velocity field u(x, t, co), the phase space 
Q and the probability measure P being suitably chosen. 

At the beginning (for a general historical background see [B.7.1 
or B.7.2]) the infant theory was confronted with many problems: 
jets, wakes, boundary layer, and so on. But, about 1930, a problem 
slowly emerged whose mathematical formulation did not seem to be 
hopeless; that is, the motion of a fluid when the walls are so far 
apart that their influence is negligible (mathematically the fluid fills 
the whole space). The Navier-Stokes equations being invariant for 
the translations x —> x + h, it is clear that one can expect the most 
simple statistical results if one supposes that all statistical averages 
are invariant under the translations or, more precisely, that the 
velocity u(x, t , w) is, at each time t , a stationary random vector field 
with respect to x ( homogeneous turbulence ). 

One of the first approaches to the construction of such stationary 
random vector fields has been described by N. Wiener, under the 
picturesque name of lt homogeneous chaos” [B.7.20]. 

The main idea consists in the representation of a random variable 
F( co) belonging to L 2 (S1) as the limit in quadratic mean of a sum 

V 

X PnM 

where P n (a>)—called an homogeneous polynomial chaos—is itself 
a sum of products of Hermite polynomials 

A ni , . . . , nfc 0 H ni [W(Ij , to)] 

m+ • • • n 

W being the Wiener function and the intervals 1 i, . . . , Ik being 
nonoverlapping. Wiener introduced his theory explicitly in view 
of statistical problems and with special reference to homogeneous 
turbulence. Nevertheless, it has never been exploited in full and 
only recently has it again attracted attention [B.8.10]. The main 
point of Wiener’s chaos is that one does not have to make any 
a priori hypothesis about the probability law of F(u>). The coeffi¬ 
cients A ni , . . . , nk have to be computed in each particular problem 
and, when they are known, they determine the probability law. 
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Of course, only the first homogeneous polynomial chaos follows the 
normal law. The sum of the other polynomials (n > 2) measures 
the discrepancy of the actual law from the normal. 

So far the theory has developed along another line, concentrating 
on the moments 

(7.1) E[u h (x h u jn (x n , t, «)] 

which are the more easily obtained experimental data. Here, the 
correlation tensor 

(7.2) Rj.ki* - y) = E[uj(x, t y o>)u k (y, t } (a)] 

and its Fourier-transform the spectral tensor SjjM play the funda¬ 
mental role. 8 In [B.7.3] G. Birkhoff and J. Kamp6 de F6riet tried 
to put the theory in a rigorous measure-theoretical setting. We 
consider the linear topological 9 space A of all g-dimensional vector 
fields u(x) defined on R p which are square integrable on any compact 
set D 

Ed = J D |u(x)| 2 dm(x) < +oo m = Lebesgue measure on R p . 

We^call a measure on a linear topological space regular if it is the 
Lebesgue completion of a measure defined on the Borel sets. A 
regular measure P on A is admissible if the kinetic energy Ed has 
finite expectation for any compact D. We prove that any regular 
measure is strictly separable and metrically separable and establish 
the connection between admissible measures and random vector 
fields measurable with respect to P X m. We introduce the spec¬ 
tral matrix measure Sj, k ( k) and the correlation matrix R Jtk ( h) and 
prove that the correlation matrix of each admissible homogeneous 
vector field is associated with one and only one spectral matrix 

8 In the one-dimensional case the correlation and the energy spectrum were 
introduced by G. I. Taylor; generalized harmonic analysis [B.7.19] was explicitly 
suggested to N. Wiener by^these ideas of G. I. Taylor. The definition of the 
correlation tensor is due to von KArmdn and Howarth (1938). The spectral 
tensor was introduced in 1948 by G. K. Batchelor and J. Kamp6 de F4riet. Our 
paper [B.7.9] was an attempt to put the physical intuitions of Taylor in the 
mathematical frame of the newborn theory of random functions (1939). 

9 A is topologized by the condition 

Un(x) —> u(x) means |u n (x) — u(x)| 2 dm(x) —> 0 


for all compact D. 
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measure and conversely. We establish a uniqueness theorem prov¬ 
ing that there is a one-one correspondence between the class of 
spectral matrix measures satisfying the symmetry condition 
£/,*(*) = $/,*(—k) and the class of normal admissible random vector 
fields. Finally, we prove that any normal admissible stationary 
random vector field having reflection symmetry, whose spectrum is 
absolutely continuous, is metrically transitive , that is, the statistical 
averages and the averages computed in the space R p are equal with 
probability one . 

From this resume, compressing into a few lines forty years of 
research, we conclude that the kinematics of homogeneous turbu¬ 
lence is now in good mathematical standing. However, this is only 
the beginning of a statistical theory of turbulence. According to 
the views of statistical mechanics, we know the phase space 0, that 
is, the set of vector fields which can represent at a given time t } the 
state of the system (defined by the velocity field); now we have to 
determine the trajectory in 12, starting from an arbitrary initial 
state. 

Let us admit that any motion, even turbulent, of an incompressi¬ 
ble viscous fluid must satisfy the Navier-Stokes equations (this is a 
very controversial hypothesis and, to our knowledge, no crucial 
argument pro or con has definitely closed the discussion). Then 
the answer is straightforward, that is, the evolution must be 
described by a random integral of the Navier-Stokes equations. 

This sounds clear and simple, but unfortunately the theory of 
random integrals of Navier-Stokes equations still belongs to the 
realm of wishful thinking! 

We know very little about the integrals of these nonlinear equa¬ 
tions. Until recently, only elementary integrals (with special sym¬ 
metry properties) were obtained. In 1933 the first existence and 
uniqueness theorems were proved by J. Leray. In the last ten 
years the field has become very active; and great progress has been 
made through the work of D. Graffi, E. Hopf, J. L. Lions, G. Prodi, 
I. A. Wolska-Bochanek, and 0. A. Ladyshenskaya. Even if we 
are now able to escape the pitfall of branching or exploding integrals, 
nevertheless one fact remains; a set of integrals is never a linear 
space, which makes the discovery of significant topologies and meas¬ 
ures in the phase space 12 very difficult. 

As long as our statistical theory of turbulence is fundamentally 
based on Navier-Stokes equations, the success of our quest depends 
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on the progress of our ideas on the random integrals of these equa¬ 
tions. The road leading to the construction of these integrals 
seeming barred, most research has been oriented toward the 
moments (7.1) and their evolution with time t. 

One uses a method which basically goes back to O. Reynolds: 
Starting from the Navier-Stokes equations written for the velocity 
field u(x, t, co), one takes their scalar product with the vector field 
u(y, t, <o), and averaging (with respect to a>), one obtains the values 
of the derivatives 

i -« 


(the so called K&rm&n’s equations for the propagation of correla¬ 
tion). These time derivatives are expressed in terms of the space 
derivatives of moments of order 3. By a new application of the 
same process we can compute the time derivatives of these moments 
of order 3, but now they are defined in terms of the space derivatives 
of moments of order 4 and so on, and this goes on indefinitely. 
This chain reaction of moments of higher order on the evolution of 
moments of smaller order is known as the closure problem of turbu¬ 
lence theory (see Kraichnan [B.7.14]); it was already confronted by 
G. I. Taylor in 1935 [B.7.19]. The unavailability of a method of 
solution of this countable number of equations, simultaneously 
describing the evolution of all the moments, has suggested many 
approximate solutions. One of the most popular is the quasi¬ 
normality hypothesis from Millionstchikov [B.7.15]. The hypothe¬ 
sis is made that the moments of fourth order are in the same relation 
to the correlation as for a normal random vector field. This approxi¬ 
mation has been systematically applied by Proudman and Reid 
[B.7.16] and Tatsumi [B.7.17]. 

The most heroic cure to cut the gordian knot is the introduction 
by E. Hopf [B.7.5] of the characteristic functional , getting rid, once 
and for all, of the moments; v*(u) being an arbitrary linear func¬ 
tional of the velocity field u(x, t , co), the characteristic functional is 
defined by 

$(v*) = E[exp (iv*(u))]. 


E. Hopf has given an integro-differential equation defining —: 

dt 
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where L\ is a linear differential operator (functional differentiation 
with respect to argument it, of first order, which arises from viscosity 
terms in the Navier-Stokes equations) and L 2 is of second order and 
arises from the inertial terms. 

Should it be possible to solve this equation, the knowledge of $ 
being equivalent to the knowledge of the probability law of u(x, t , co), 
this would yield a complete solution of the problem. But today one 
is far from this end and we agree with E. Hopf when he writes 
[B.7.7, p. 157] “The obstacles in the way of a rational theory of 
highly turbulent fluid flow are still formidable.” 

8 . j Burgers Model . The great mathematical difficulties encoun¬ 
tered in the theory of turbulence seem insuperable today and at 
present the way to a successful attack on them seems hopelessly 
barred. However, there is no doubt, that many characteristic 
features of turbulent flows occur in a much larger class of similar 
nonlinear equations—“In order to gain insight into the nature of 
hydrodynamical phase flows we are, at present, forced to find and to 
treat simplified examples within this class” (E. Hopf fB.8.8, p. 304]). 
The most popular example is “Burgers model” [B.8.2] whose motion 
is defined by one scalar function u(x, t) (in the half-plane t > 0, 
— 00 < x < + 00 ) by the nonlinear equation: 

(8.1) u t + uu x = vu xx (v = constant > 0). 

This equation is the simplest analogue of the Navier-Stokes equa¬ 
tions, insofar as it contains a nonlinear term with a space derivative 
and a linear term with a second-order space derivative multiplied 
by a factor which can be very small. Since 1940 extensive mathe¬ 
matical investigations have been carried out. From the beginning 
it was observed that the integrals exhibit the property that regions 
make their appearance where \u x \ takes very high values (steep 
fronts). These can be considered to be analogous to the regions of 
very high vorticity in a turbulent flow. 

The study of the integrals of the Burgers equation became easier 
when it was discovered (E. Hopf, 1946, J. D. Cole, 1949) that it can 
be reduced to the heat equation, by the transformation 

u(x, t ) = — 2v — log w(x, t) w(x, t) > 0. 

’ dx 

Starting from this fact, E. Hopf [B.8.9] proved an existence and 
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uniqueness theorem in the half-plane, even giving an explicit expres¬ 
sion of u(x, t), under the assumption that the initial condition v(x) 
satisfies 

f*vtt) dz = 0(x 2 ) 

(which cannot be very much weakened as is shown by the example 
u = x/t — r, which explodes at t = r > 0). 

From the explicit expression of u(x, t), one easily deduces its 
asymptotic behavior for £—>+ 00 ^ when v(x) Gi(“ 00 ,+ 00 ). 
Most interesting is the consideration of the way in which the integral 
depends on the parameter v and especially the behavior of u(x y t f v) 
when v J, 0. The connection between this limit and the integrals 
of ut + uu xx — 0 is illuminating. It throws great light on the coa¬ 
lescence of the steep fronts and the apparition of discontinuity lines. 

If one wants to use the Burgers equation as a model for turbulence, 
the question naturally arises, are some integrals of (8.1) “compli¬ 
cated” enough to represent a chaotic flow? J. Bass [B.8.1] proved 
that u(x y t) can be an almost periodic function of t and—what is 
more interesting—a “pseudo-random” (pseudo-al^atoire) func¬ 
tion of t . 

The Burgers equation being invariant under the translation 
x —> x + h y the consideration of the spatially homogeneous case 
is most natural. Starting from an initial stationary random func¬ 
tion v(x y co), the formula established by E. Hopf gives an explicit 
expression of the corresponding random integral u(x f t } co). Unfor¬ 
tunately, no statistical property of u(x , t } w) has yet been obtained 
in this rigorous way. 

Burgers concentrates ([B.8.6, B.8.7]) his efforts on the moments 

Ui m U 2 n = E[u(x, t } a>) m u(x + h, t } a) n ). 

He finds the analog of von Kdrm&n’s equation for the propagation 
of correlation in homogeneous turbulence 

q _ Q _ ^2 _ 

( 8 * 2 ) ^ (niu 2 ) = — {u^u 2 ) + 2v (wiw 2 ); 

a single equation to determine two functions! But here the situa¬ 
tion is very different, because we know [from the explicit expression 
of u(x y t f a>)] that it should be possible to determine U\U 2 without 
additional hypotheses about Ui^it 2 * It is possible to obtain asymp- 
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totic expressions for these two moments. The most interesting fact 
is that the only statistical property of the random function v(z, «) 
which appears in these expressions is the integral 

J — J Q U 1 U 2 dh } 

which has been shown to be independent of t . In spite of the labor 
and ingenuity involved in these computations, it must be observed 
that the expressions obtained are asymptotic in two respects: the 
time t is very large and v (which does not appear explicitly in the 
results) is very small. 

Burgers [B.8.5] has also considered a different problem referring 
to the propagation along the z-axis of impulses introduced at 
x = 0 at a series of successive instants. In this case there is no 
homogeneity in space, but the system is statistically stationary with 
respect to time. 

9. Linear Equations; Well-Set Caitchy Problem . The Navier- 
Stokes equations (and the simplified Burgers model) belong to the 
general class of partial differential equations 

duj r . 

(9.1) ^ = <Mu] 

where the ®j are differential operators in the space derivatives of the 
Uj\ the coefficients do not depend explicitly on the time. This type 
of equation appears to be the most usual in the mechanics of con¬ 
tinuous media. The building of a statistical mechanics of fluids is 
actually far beyond our reach because the <5ty are nonlinear. How¬ 
ever, continuous media, whose motions are defined by linear equa¬ 
tions have been the happy hunting ground of mathematical physics 
for a long time (vibrating string or beams, sound waves, Airy’s 
gravity waves, and so on). „ Thus the question arises: is it possible 
to generalize the statistical mechanics of Gibbs to continuous media 
whose motions are defined by linear partial differential equations? 
We will sketch briefly the results obtained, referring to [B.9.4] for 
more details. 

(a) The continuous medium fills a domain D p in Euclidean space 
R p :x = [x 1 , . . . , xj; D p has a smooth boundary 1 ; one can 
have D p = R p . 
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(b) The state of the medium is defined, at any given time t } by a 
real-valued u = [u h . . . ,u g ]on D p . 

(c) For 0<£<r<+oo, u(x, t) satisfies the linear partial dif¬ 
ferential equations: 


« 



the Pj tk being polynomials whose coefficients may depend on x but 
not on time t. 

The determination of the motion corresponds to a typical Cauchy 
problem, when the initial conditions and the boundary conditions 
are given; the problem is well set (bien post, J. Hadamard, 1903) if 
one can prove an existence , uniqueness , and continuity theorem for 
the integral u(x, t). This proof can be achieved only when one has 
first defined 

(a) the set ato of all functions u 0 (x) on D p considered as admissible 
initial conditions; 

(P) the set nti of all functions Ui(x, t) on B p _ x X (0, r) considered 
as admissible boundary conditions; 

(y) the set <11 of all functions u(x, t) } taken as functions of x on D p 
(for each t) , considered as regular integrals; 

(5) topologies on the function spaces ^o, ‘lli, 01: 

(e) the precise meaning of the limits 

(9*3) u(x, t) -» ui(x, t) on B p _ x 

(9.4) u(x, t) u 0 (x) on D p if 1 j 0. 

As a rule this choice is not unique . For instance, in the case of the 
heat equation in an infinite rod, in [B.7.13] three completely dif¬ 
ferent choices are discussed, each one of them leading to a well-set 
Cauchy problem. This is of paramount importance because obvi¬ 
ously the statistical mechanics will be strongly influenced by this 
initial step. We must not forget our liberty of choice, when later 
we are tempted to interpret our results as “laws of nature”; the 
“laws” are highly dependent on our initial fiat! 

For simplicity we will here consider only the correspondence 
between and *ll; the boundary conditions appearing only in the 
background in most cases. For instance, they often reduce to 
u(x, 0 = 0 on B p _i, and they completely disappear if D p = R p . 
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Due to the linearity of the differential equation (9.2), we must 
obviously consider H 0 and H as linear topological spaces; the cor¬ 
respondence between Ho and Hi 

(9.5) u(x, t) = !T ( uo(x), 
has the semigroup property 

(9.6) T t+t = T t T. = T,T t . 

This was noted as far back as 1924 by J. Hadamard and is a property 
which ■ expresses the Huyghens principle (“principle of scientific 
determinism”; see [A.5.5, pp. 388-390]). The correspondence 
being continuous, by definition, Tt is a linear continuous transforma¬ 
tion of Ho unto 01; hence the inverse image T t ~ l of any open set 
(in the topology 3 on 'll) must be an open set in the topology 3 0 on 
H 0 , for any t £ (0, r). 

Most often <11 is a proper subset of H 0 , the regular integrals being 
smoother than the initial conditions; thus for the heat equation in 
an infinite rod if Ho = C[ — 00 , + 00 ,] or L p [ 00 + 00 ]> then u(x, t) 
belongs, for each t > 0, to the same space, but must be an entire 
function of x. 

Thus the function space Ho can he taken as phase space 12. From 
each point w of 12 there issues one and only one trajectory represent¬ 
ing the evolution of the continuous medium. Here we have a very 
natural generalization of the description of motion of a holonomic 
system. Nevertheless we must point to an important difference; 
the Hamilton-Jacobi equations (1.1) are invariant if t is replaced by 
—t (the past and the future are both determined by the present). 
Such is not the case for all partial differential equations (9.2), for 
instance, it is not true for equations of parabolic type. This 
explains why the group of transformations (1.2) are replaced, in 
general, by a semigroup. 

The next step is to define probability measures on 12. The most 
natural way seems to be to define a regular probability measure P 
(see page 296) on the linear topological space H 0 ; hence, putting 

(9.7) Pt(e) = P[Tr'e] 

for all open sets of H, we construct a <r-additive measure on the 
Borel sets of H. By definition, the Lebesgue completion of this 
will be a regular measure in the space of regular integrals 'll Cl Ho 
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[B.9.5]. We obtain the random integrals of (9.2) 

u = u(x, t, «), 

if we assume that co, which represents the initial conditions Uo(x) 
in Q — ^o, is chosen at random according to the probability law P. 

In order that the continuous medium be in statistical equilibrium 
or, what is the same, that the random integral be a stationary 
random function of t } we must have 

P t (e) = P(e) 

for all open sets. We will see in an example that for some parabolic 
equations this is not possible, except for trivial measures, giving 
the measure one to the random constants u(o>) independent of 
x and t . 

10. Banach Spaces; Abstract Cauchy Problem . We will now spe¬ 
cialize to linear partial differential equations (9.2) with constant 
coefficients. 

For a great number of well-set Cauchy problems, corresponding to 
systems of linear equations with constant coefficients one is most 
naturally led for the phase space 0 to a Banach space; the most usual 
spaces being C (m) (D p ) and L m (D p ). 

Starting from this remark E. Hille gave a new formulation of the 
problem as an abstract Cauchy problem [B.10.3]. For each t , u(x, t) 
is an element w of the Banach space 2 and the integral corresponding 
to u 0 (x) is defined by the trajectory originating at Uo under the 
action of a suitable semigroup of continuous linear transformations 
Tt of 2 into itself. In [A.5.5] E. Hille developed important applica¬ 
tions to hyperbolic, parabolic, and elliptic equations with constant 
coefficients. R. S. Phillips [B.10.7] modified the formulation some¬ 
what and in [A.5.6, p. 619] an abstract Cauchy problem is described 
as follows: Given a linear operator &(<*>), with domain SD(Ct) and 
range 6i(&) in the Banach space 2 , and given a)o £ 2, find a function 
w(0 from (0, + oo) to 2 such that: 

(a) 03 (t) is strongly absolutely continuous and continuously dif¬ 
ferentiable in each finite subinterval of (0, + oo); 

(b) for each t > 0, c*>(0 £ 2D(a) and 

den 

— = «{«(()]; 


( 10 . 1 ) 
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(c) lim IMO ~ “o|| =0. 

40 

We refer to [A.5.6, pp. 619-633] for existence and uniqueness the¬ 
orems, and for the properties of the semigroup of transformations 
T t [the so-called Co-type, according to condition (c)]. The differ¬ 
ential operator tt is the infinitesimal generator of a strongly con¬ 
tinuous semigroup of transformations Tt of the Co-type if 

(10.2) lim ^ ( T h w — a0 — ®(“) = 0. 

Mo n 

The problem of integration of the linear equations (9.2) being 
solved as an abstract Cauchy problem, if we define a probability 
measure P on the Banach space 0, we obtain an elegant representa¬ 
tion of the random integrals as random functions c»(<) from [0, + °°] 
to fi. Most of the observable properties of a continuous medium 
being defined by linear functionals, we will suppose that P is an 
L-measure (M. FrSchet, E. Mourier), that is, that every linear 
functional is P-measurable; the connections of the L-measures with 
the regular measures are given by [B.9.5, p. 667]: 

(a) A measure on a Banach space is an L-measure if, and only if, 
it is an extension of a regular measure for the weak topology. 

(b) A measure on a separable Banach space is an L-measure if, 
and only if, it is an extension of a regular measure in the strong 
topology. 

In all problems of interest in theoretical physics the construction 
of an L-measure on the Banach space is greatly simplified by the 
existence of a Schauder basis [e„ . . . , c„, . . .]: there is a one-one 
correspondence S between the point &> of and a sequence of real 
numbers [iji, . . . , ij», . • •] C R°° such that: 

(10.3) lim ||w — (ijiei + V 2 e 2 ' ' ' + *?««»)II “= 0. 

n—► « 

We start from a sequence of random variables defined by the 
sequence of probability distributions 

Prob [j;i < oti, ..•)’?»< “»] = &n(. a u ■ ■ ■ > a n>- 

By a classical theorem (Kolmogorov) this determines a unique 
probability measure v on a cr-algebra g of subsets of R°°; then 0 
being, through (10.3), in a one-one correspondence with a subset 
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8* of i2°°, Q* — Stij we give a condition necessary and sufficient for 
ft* G S and p(Q*) = 1. 

The probability measure P is now defined on the o-algebra $F of 
subsets A of Q such that SA G g* (g* = restriction of g to 0*) by 

P(A) = p(SA) 

and finally, we prove that P is an L-measure (for details we refer to 
[B.10.4]). 

When, for a continuous medium, the phase space J2 is a Banach 
space with a Schauder basis, the state of the system is thus defined 
by a countable number of coordinates [m, . . . , Vn , . . .]; in this case 
one has the most natural extension of Gibbs’ statistical mechanics. 

Among the L-measures on a Banach space, the normal measures 
are particularly important; these are the measures for which every 
linear functional is a normal random variable (M. Fr6chet). As 
for the canonical distribution of Gibbs, the choice of a normal meas¬ 
ure for the statistical mechanics of a continuous medium is justified 
by the principle of maximum information entropy (see [B.10.5] 
where we give a proof for the vibrating string). 

As a last remark let us point out that in many cases the solution 
of the abstract Cauchy problem becomes much easier if one uses 
the Fourier-transform technique (which amounts to the replace¬ 
ment of the Banach space 12 by another isomorphic Banach space) 
Pursuing the ideas of E. Hille [B.10.3], G. Birkhoff, and T. W. 
Mullikin [B.10.1J have shown how to construct a Go-semigroup for 
any well set Cauchy problem for linear partial differential equations 
with constant coefficients. In [B. 10.2] G. Birkhoff makes, by means 
of the Fourier-transform technique, a thorough study of the case in 
which D p is the product of s real lines and p — s circles, which rep¬ 
resents the most general locally Euclidean Abelian group manifold. 

11. Spatial Homogeneity. The linear partial differential equations 
(9.2), with constant coefficients, are invariant for any translation 
x^x + h inRP. Thus, when D p = Rp, the class of random inte¬ 
grals which comes to mind is obviously the class of u(x, t, o>) which 
are stationary with respect to x at each instant t. 

A regular measure P on a function space, whose topology is 
invariant under the translations x -» x + h, will be called homo¬ 
geneous if P(B + h) = P(B) for any Borel set B. 
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One can prove [B.9.5] the following result: if P is a homogeneous 
regular measure on the linear topological space *110 (initial condi¬ 
tions) and if P t is defined (for each t > 0) on the linear topological 
space <11 (regular integrals) by the condition (9.5), then Pt is a 
homogeneous regular measure for each t > 0. In other words, if 
the initial conditions c*> — Uo(x) are spacially homogeneous, then 
the random integrals u(x, t, co) are stationary with respect to x for 
any t > 0. 

If the equations (9.2) are interpreted as defining the motion of a 
continuous medium, the consideration of the total energy of the 
system is of paramount importance. It is only when this total 
energy is finite that it is natural to choose for the phase space a 
Banach space and to interpret the problem as an abstract Cauchy 
problem. When the total energy is infinite, the method described 
in Section 10 cannot be applied. 

The simplest case occurs for continuous media such that the 
energy corresponding to a cube K has the same expression as the 
kinetic energy of an incompressible fluid. 

E k = J K |u(x)| 2 dm(x). 

Due to spatial homogeneity, the cubes K and K + h have, on the 
average, the same energy. Thus, the mean total energy for the 
whole space is infinite and we are prohibited from taking as phase 
space the Hilbert space L 2 (R P ). 

It is most natural to define the topology by the countable family of 
seminorms 

(11.1) NU = [/ K >(x)| 2 dm(x)]* 

where the K n , n = 1, 2, . . . are the cubes {x:|x,| < n, j = 1, 
2 ... ,p}. This linear topological space is precisely the space A, 
introduced on page 296. Alternately, one can also define A as the 
unrestricted direct topological product of countable many Hilbert 
spaces L 2 (A„); {Ai, . . . , A n . . .} being a countable partition of 
rp into compact domains [B.7.3 p. 668], This observation indi¬ 
cates that, even though A is not a Banach space, a point of A can 
nonetheless be defined by a countable number of coordinates. 
Thus, exactly as in Section 10, the continuous media having A. as 
phase space lead to the most simple generalization of the statistical 
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mechanics of Gibbs to systems with a countable number of degrees 
of freedom. 

Finally, let us note that A is complete, locally convex, and metriz- 
able (separable Fr6chet space). 

Now let us sketch briefly the simplest instance of stationary 
random integrals in A. For p = q = 1 , the equations (9.2) reduce 
to 



We suppose that the polynomial Q satisfies the condition 

Jim Q{ik) = — oo % real, 
l«H co 

which means that the linear differential equation is of the 'parabolic 
type. Now we shall define a probability measure P on A. Let us 
take an arbitrary absolutely continuous energy spectrum 

(11.3) d£(ic) = S'(k) dK 8' > 0 and 8' £ L( 0, + oo). 

From the uniqueness theorem [B.7.3, p. 698], we know that 8(k) 
determines a unique normal homogeneous admissible regular measure 
P . Let us assume that the sample functions co = co(x) (the initial 
conditions) are chosen at random in A according to 

Prob [co £ A) = P(A). 

The function 

G(x, t ) = d K t > 0 

being the fundamental integral or the Green's function of (11.2), 
the formula 

(11.4) u{x, t, co) = G(x - £, f)«(£) t > 0 

then defines a random integral of (11.2) for almost all to. It is an 
analytic function of (x, t) and its derivatives satisfy (11.2) (literal or 
smooth integral according to the current nomenclature). 

The random integral u(x, t, co) is stationary with respect to x for 
each t > 0, and it has an absolutely continuous spectrum 

d8 u (#c) = 8'(k) exp [2tReQ(iic)] dK. 

Here we have an example of a statistically well-set Cauchy problem . 
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The space A is an enormous bag containing far too many functions, 
but by using our statistics as a sieve, we discard many of them. The 
initial function being submitted only to the condition co(x) E L [a, b] 
for any finite [a, 6], one can show that (11.4) has no meaning in 
general; but among all the functions <o(x) E A we select a particular 
set A*, to which our statistics gives measure one 

A* = LE L 2 (~ oo, +*>)}, 

l V1 + x 2 9 

and for these special initial conditions the integral (11.4) gives the 
solution of the Cauchy problem. All the other functions are 
unimportant statistically because P(A — A*) = 0. The special set 
A* is singled out through metric transitivity , which is realized here 
as a consequence of the absolute continuity of the spectrum S(k) 
[B.7.3, p. 331]. Due to this metric transitivity, 

lim f co(x) 2 dx = S(+ce) < + 

a -** 2 A J _ a 

for almost all a*. But it is known [C.4, p. 138] that if, for a function 
f(x) the integral 

uL fix),dx 

is bounded for A-> then f(x)/V 1 + x 2 & L 2 (~ , + «)• 

Thus for any normal homogeneous regular measure P on A one has 

P(A*) = 1. 

Now a question arises which is of great importance for statistical 
mechanics. The random integral (11.4) is stationary with respect 
to x. May we choose the measure P so that u(x, t y <a) will also be 
stationary with respect to i? The answer is: no, except in trivial 
situations. For instance, for the heat equation [B.9.4, p. 180], 
u(x , t y co) can be stationary with respect to x and t only if P gives 
measure one to the set of constant initial functions «(x) “ a. 
This result points to a major difference between parabolic and 
hyperbolic equations. For instance, there are large classes of 
random integrals of the wave equations which are at the same time 
stationary with respect to x and t (for example, sound waves [B.9.4, 
p. 193]). For an ensemble of systems whose motion is defined by 
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a hyperbolic equation, it is always possible to realize statistical 
equilibrium. This is not true for systems corresponding to a para¬ 
bolic equation, except in the trivial case when each system of the 
statistical ensemble is itself in equilibrium. This remark could 
have far-reaching consequences in physics. 

12. Conclusions . The reader has certainly observed that, in the 
preceding sections, we have always looked through a window open¬ 
ing on a very limited horizon. From the beginning we have been, 
so to speak, polarized by statistical mechanics. We must now stress 
that there are many other ways leading to random integrals of 
partial differential equations, and most of these ways lie on virgin 
soil. 

For instance, let us take the random integrals of Laplace’s 
equation in the unit circle. 

(1) One can take the Dirichlet problem: The harmonic function 
in the circle r < 1 is determined by its boundary values o)(6) 
0 < 8 < 2tt through the Poisson or Poisson-Stieltjes integral. 
Here one is almost in the same situation as in the Cauchy problem 
of the preceding sections. The boundary condition plays the same 
role as the initial condition; a probability measure on the set 'Ui of 
the boundary condition induces a measure on the set of harmonic 
functions [B.11.3], 

(2) The harmonic function being defined by the series 

ia° + ^ r n (a n cos nd + b n sin nd ), 

« “ i 

one assumes that the coefficients are random variables a n (w), b n ( w) 
such that 

Prob [lim sup |a n | 1/n = 1] = 1 and Prob [lim sup |fc n | 1/n = 1] = 1. 

This is a completely different way of defining a random harmonic 
function in the circle r < 1. 

(3) Several different types of Banach spaces of harmonic func¬ 
tions have been considered (see [B.12.4]). On one of them, <B, 
define an L-measure. The random element u in (B corresponding to 
this measure constitutes a new type of random integral of the 
Laplace equation. 

(4) One can replace the Banach space by a linear topological 
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space analogous to the space A. On the set of harmonic functions 
in the unit circle 7 , define a topology by the countable family of 
seminorms 

Dull.-|X \u(.r,e)\*rdrde] H , 

[71, . . . 1 7», . . J being an increasing sequence of circles 
7» C 7»+i, 7n T 7. 

We do not know of any research exploring the two last fields. 

I shall not endeavor to discuss the connections between the ran¬ 
dom integrals considered in this survey and the research made, 
more or less, in the spirit of the Monte-Carlo method. In the latter 
case one explores integrals of partial differential equations (for 
instance harmonic functions) by means of a random walk or of 
Brownian motion path. The papers [B.12.1, B.12.2, and B.12.3] 
by J. L. Doob are among the most typical of this point of view. 
Their results, for instance in an abstract theory of boundary value 
problems, possibly have certain connections with some of the ideas 
related here; but the task of analyzing them would prove over¬ 
whelming for the time alloted to this survey and for my own 
limitations. 


C. APPENDIX 

I. In the usual terminology of measure theory, a probability 
space is a measure space such that P(Q) = 1. 

A random variable x(co) is a P-measurable function from 0 to a 
space 9C. Here we take for 9C the real line R and exceptionally the 
complex plane; the only exception is Section 9, where 9C is a Banach 
space. 

Given a set J , a random function on J is a set of random variables 
x(t , co) defined for all t J• Here one always has J C R p • I n 
this note we will develop mostly the case J C R- For a fixed co the 
function x(t , co) from J to R is a sample (or a realization) of the 
random function. 

Taking for 0 a given set of functions co(£)> t £ J , defining a prob¬ 
ability measure P on S 2 , one obtains a random function x(t 7 co) whose 
samples are the given functions c c(t). One says that x(t f co) is of the 
function space type. 
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The probability law of x(t } co) is defined by the set of functions 

(C.l) , f n , t\ f . . . , t n ) = P{u:x(t\, co) < 

. . . x(t n , co) < £ n } 

n = • • • I {<1, ••.,*»} C j, tti, 

arbitrary. 

The expectation of a random variable is, by definition, 

Pfz(co)] = ^z(co) dP if x G L(0). 

If x(£, co) £ L 2 (fl) for t £ «/, the mean and covariance are defined by 

(C.2) m(t) = tf[*(f, co)] 

(C.3) T(t y s) = E[[x(t, co) — m(0][x(5, co) — m(s)]]. 

A random function is normal if its probability law (C.l) is normal 
(= Gaussian). The mean and variance completely determine the 
probability law. 

If one knows only the probability law of a random function, one 
finds that the probabilities that the samples are bounded in an 
interval, or continuous, or derivable, etc., may not be defined 
because the corresponding co set may not be measurable; to cope 
with this major difficulty the concept of separability has been intro¬ 
duced (J. L. Doob). Separability means roughly that the samples 
are as regular as their restriction to some countable dense subset of 
J . The random function x{t } co) is called separable if there exists a 
sequence {*i, . . . , t n > . . dense in J and a set N C & of prob¬ 
ability 0, such that for any closed interval fa, 6] C R and any open 
interval (a, 0) C J one has 

{co: x{t h co) G [a,6], Vtj G (a, fi) } - {co:z(*,co) e[a,6],V*e(a,0)} CN. 

The great interest of this definition arises from the proposition: 
Given any random function x(t , co) there exists a separable random 
function £(t, co) such that for each fixed t: 

P{<a:£(t y co) = x(ty co)} = 1. 

£(t, co) is called the separable version of x(t f co) ; obviously £(t, co) has 
the same probability law as x(t , co). When we say that a given 
random function (for example, the integral of a differential equation) 
is continuous, this is always meant implicitly for the separable 
version. 
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A random function is measurable if x(t } «) is measurable with 
respect to the product measure m X P(m = Lebesgue measure 
on J). 

The condition 

(C.4) lim [r (t, t) + T(s, s) — 2T(t, s)] = 0 t G J s £ J 

implies that the separable version «) is measurable. If x(t, <o) 
is measurable and if T(t, t) £ L(J), then almost all samples belong 
to L 2 (J). 

II. A random function is stationary if its probability law is 
invariant for any translation 

SFnW h • • • > £n>t 1 + h, . . . ,t n + h) 

= • • • y £»i tly • • • y ^n) 

for all h. 

If *(*, 0 )) £ L 2 (fl), then 

r(«, *) = 

R is called the correlation. The continuity of R(h ) for h — 0 implies 
its uniform continuity in any finite interval and one has 

R(h) — J Q cos ah 

The nondecreasing and bounded function £(k) is called the energy 
spectrum of x(t y w). 

For a normal random function, the absolute continuity of S(/c) 
implies metric transitivity and one has: 

lim f x(t } o))x(t + h y oj) dt = R(h) 
a —►+» 2 A J~ A 

for almost all samples. 

III. A random function is of the Markov type (a Markov process) 
if the conditional probability 

P{(x):x(ty u}) < 7j|a;(si, w) = £i, . . . , £($n> w) = #($> w ) ” £} 

(for any Si < s 2 * * • < s n < s < t) is equal to the conditional 
probability 

P{u:x(t, o)) < co) = £} = P(s, £, t , rj) 


s < t. 
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When the present x(s, co) is known, the knowledge of the past 
x ( s h w )> • • • } x (sn f to) does not add any information on the future 
x(t, co). The function P(s, £, t, tj ) is called the transition probability , 
and the knowledge of the transition probability and of the initial 
probability P{co :#((), co) < f} = Po(f) completely determines the 
probability law of the Markov process in [0, + oo). 

The Huyghens principle for a Markov process is expressed by 
the Chapman-Kolmogorov equation 

P ( s > £* l > n) = /_* P(t, f, t, ri) d{P(s, s <T < t. 

IV. One says that a random function has independent increments 
if for any set of nonoverlapping intervals [si, t{\, . . . (s n , t n ] 

$1 < h < S 2 < t 2 < * ■ • 4—i < s n < t n 
the increments. 


x (h, to) — x(s i, co), . . . x(t ny os) — x(s n , co) 

are independent random variables. The Wiener-L6vy function 
or Brownian process is one of the most important examples. The 
random function W ( t, co) is defined by the conditions 

(a) W{ 0, co) = 0 

(b) W(t y <o) has independent increments 

(c) W {t t co) — W (s f o)) follows the normal law with 

E[W{ty co) - W(s, co)] = 0 

E[{W{ty (x>) — TF($, co)) 2 ] = t — s 0 < s < t. 

One deduces from these conditions that W(t } co) is a normal random 
function and that 

m(t) = E[W(ty (*>)] = 0 

T(ty s) = E[W(t y a)W(s f co)] = Inf (t, s); 

moreover, W{t , co) is a Markov process. 

Almost all samples of W{t, co) are continuous in any finite interval, 
but are not of bounded variation; W{t y co) has no derivative; more 
precisely each sample has a finite derivative for at most a set of t of 
measure zero. One has, for almost all co, 

|W(* + A,co) - W{t y co)| 
lim sup 11 — 7= —- = 1. 

hi o v 2h log]log h\ 
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In [C.l] (1957) we gave a representation of W(t, w) in [0, 1] by a 
series of triangular functions e n (f) [Schauder basis for C[0, 1]] which 
converges almost surely absolutely and uniformly in t. 

(C.5) W(t, «) = St” r,„(«)«,(*) 

where the are independent normal random variables: 

E( Vn ) = 0, E( v l) = l E( v l) = 2 9-1 < n < 2* - 1. 

From this representation it is easy to develop an elementary 
theory of the Wiener-L6vy function [C.2] and in particular [C.3] to 
give a definition of the pseudo-Stieltjes integrals (W is not of 
bounded variation!) 

f 0 'F(t) dW(t, w) and / 0 + “ F(t) dW(t, «) 

by an almost surely convergent series; 

for instance if F(t) £ L 2 [0, 1] we define the integral 

X(o>) = J*F(t) dW(t, «) 

as the sum of the series: 

X(o>) = S”Zo’ 00 2 (fl+1)/2 a n ij n (co) 

where the a n are the Fourier coefficients of F(t) with respect to the 
Haar functions h n (t) (orthonormal basis in L 2 [0, 1], connected with 

the Schauder basis in C[0, 1] by: e n (t) = 2 (g « +1)/2 J 0 h n (s) ds ); the 
series is almost surely convergent and one has: 

E(X) = 0 E(X 2 ) = 2+”a„ 2 = £ F(t) 2 dt. 
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